Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Order of data flow

  1. #1
    Join Date
    Sep 2010

    Default Order of data flow


    I know that for some steps like "delete duplicate rows" is important if the data flow is orderd by some columns.

    What about Union steps like "Join by key" or "Join (Cartesian Product)"? I have done different tests and I'm almost sure that ordering the data flows by different columns is giving different outputs. Is the order important when using these steps?

    Thanks in advance.


  2. #2
    Join Date
    Feb 2011


    AFAIK, all kinds of joins/merges are affected by the order of things - not only on Kettle, but in SQL, Postgre and others as well. You should always sort data before performing actions like these

    BTW, both in joins and in deleting duplicates you need a key column. Always sort by this key!
    Last edited by joao.ciocca; 03-17-2011 at 01:42 PM. Reason: BTW

  3. #3
    Join Date
    Nov 1999


    The "Join Rows (Cartesian product)" step doesn't care about the order in which the data is being offered as the complete product of all data sets is considered (hence the "Cartesian product" specification).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.