Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Result Row behavior across transformations

  1. #1
    Join Date
    May 2009
    Posts
    4

    Default Result Row behavior across transformations

    I'm looking for a little bit of clarity about how result rows are treated when you chain transformations together.

    I've created a sample job that hopefully illustrates my issue. There are three transformations in the job. The first lists all the files in the transformations directory and copies the rows to the result. The second transformation gets the rows out of the result and logs each row. The third transformation is identical to the second (i.e. it gets the rows out and logs each one).

    If you run the job, the second transformation does exactly what I expected--it logs a single line for each job or transformation file. The third transformation, on the other hand, logs only the last row from the original results, and it logs it twice.

    I'm not sure if there is some bug here or I'm misunderstanding what happens with result rows as they pass through transformations. A couple notes: I'm using "Execute for every input row" on the logging transformations. Also, Kettle seems to behaving the same way in both 3.2 stable and in the subversion trunk.

    Thanks in advance.
    Attached Files Attached Files
    Last edited by akilker; 08-07-2009 at 05:09 PM.

  2. #2
    Join Date
    May 2009
    Posts
    4

    Default

    I believe I've found the cause. When you select "Execute for every input row", the result rows are cleared for every iteration of the transformation.

    Line 588 in JobEntryTrans:

    Code:
                if (execPerRow)
                {
                    result.getRows().clear(); // Otherwise we double the amount of rows every iteration in the simple cases.
                }
    Just after that section of code, you can see how the code attempts to put the result rows back. However, the rows are cleared each time and only the latest row is restored. This means that in the end you only have the last result row. How exactly you get it twice instead of only once, I couldn't quite determine.

    Something not-quite-right is happening. Either the result rows should emerge from the transformation as they arrived OR you should perhaps get only the last result row. My preference would be for the former.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.