Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Iterate result rows between jobs and pass on result rows to next job

  1. #1
    Join Date
    Aug 2016
    Posts
    290

    Default Iterate result rows between jobs and pass on result rows to next job

    Let's say there is a set of rows passing between jobs. Job A creates the rows, send them to Job B which reads it, does some work based on it and sends the rows to Job C...

    Now at one certain job in this chain, it has become a performance requirement to execute 1 row individually. This is done using the normal loop using the checkbox "execute for every input row?". But how can this job execute for every input row and also send all the rows to the next job?

    Let's say the result rows is a list of persons:

    JobA (create list of persons, send as result rows) --> JobB (reads list, do some work, send list) --> JobC (Loop through list, then send entire list) --> JobD

    How can JobD receive entire list of result rows after JobC has iterated it?
    Last edited by Sparkles; 10-12-2017 at 03:26 AM.

  2. #2
    Join Date
    Apr 2014
    Posts
    18

    Default

    It is hard to understand your logic without looking at your data. Try to add some sample data and kjb/ktr's.

    Quote Originally Posted by Sparkles View Post
    Let's say there is a set of rows passing between jobs. Job A creates the rows, send them to Job B which reads it, does some work based on it and sends the rows to Job C...

    Now at one certain job in this chain, it has become a performance requirement to execute 1 row individually. This is done using the normal loop using the checkbox "execute for every input row?". But how can this job execute for every input row and also send all the rows to the next job?

    Let's say the result rows is a list of persons:

    JobA (create list of persons, send as result rows) --> JobB (reads list, do some work, send list) --> JobC (Loop through list, then send entire list) --> JobD

    How can JobD receive entire list of result rows after JobC has iterated it?

  3. #3
    Join Date
    Aug 2016
    Posts
    290

    Default

    Sample data:

    Name, address, telephone
    Robert, Street 1, 555123
    Linda, Road 2, 555456

    The data is really irrelevant, it could be whatever. You have a list with length n.

    Then you have a series of jobs, each having to access this list:

    JobA --> JobB --> JobC --> JobD --> ...

    This is very easy to achieve, each Job has "Get rows from result" and then use a "Copy rows to result" to share the list with the job after. But now, I want StepC to access the result rows in a loop while also sharing the list unmodified with JobD. Suggestions?

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Wouldn't a temporary file be a clean way to achieve what you want?
    You can't have your cake and eat it, after all.
    The result stream isn't transitive by design - it must be recreated for each hop.
    In your example, even hops A->B and B->C don't share the same result stream, or do they?
    Last edited by marabu; 10-19-2017 at 05:30 AM. Reason: replace transient by transitive
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Aug 2016
    Posts
    290

    Default

    A-B and B->C maybe not share the same objects, but the contents will be identical.
    I think putting temporary info in file or db is just a really messy way. It is one of the biggest flaws in my mind with Pentaho ETL. The limitation on sharing data between jobs and transformations.
    For example, I would prefer to have any number of result streams, not just one. I would like full control of creation, modification and access in any job/transformation.
    In code, this would be super easy to have one array in memory, loop through the array and then have some other object later use the same array.

    When working with big data, this becomes a real problem. You don't want to write million of rows to file, just so you can access it in a different transformation. Instead, you must pretty much do the entire operation on the big data in a single transformation.
    Last edited by Sparkles; 10-19-2017 at 04:59 AM.

  6. #6
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    What about looking at either the Transformation Executor or the Job Executor?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.