Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Copy rows to result behavior

  1. #1
    Join Date
    Jan 2013
    Posts
    26

    Default Copy rows to result behavior

    I'm following this guide located at https://www.clearpeaks.com/copy-rows...ult-in-kettle/. I'm using the REST client in one transformation to retrieve a resultset and then using the copy rows to result to pass the resultset to another transformation, where I access the individual fields as parameters. This is mostly working, however I'm finding certain parameters that should have a null value, contain a value for previous field in the resultset. Is this expected behavior? Any ideas on how to overcome this? I'm using pdi 8.2.0.0-342. I have 'Execute every input row' checked.
    Last edited by clarkddc; 02-19-2019 at 11:26 AM.

  2. #2
    Join Date
    Aug 2016
    Posts
    289

    Default

    I have never found any satisfying documentation for the dirty details of result rows. It seems like nobody knows exactly how this works. Only way to find out is test yourself. That's what I did. What I found was similar to what you describe here.

    For example if you in a transformation get the result rows (from previous job/trans), then filter the rows (true/false), then only copy the true rows to the result set for subsequent jobs/trans, this will only work IF there is at least 1 true row! If there are no true rows, all the input rows (true and false) are sent to the subsequent jobs/trans.

    Which means that if you wanted to have this kind of rows filter for result rows, you would need to implement some sort of dummy null "empty" row so that you could ALWAYS send 1 row to "copy rows to result" even if you filtered away all of them.

    This is messy and I have yet to find info on this level in any books or articles. Since result rows operate partially on job level also, it's difficult to debug and see what's actually going on.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,690

    Default

    Quote Originally Posted by Sparkles View Post
    you would need to implement some sort of dummy null "empty" row so that you could ALWAYS send 1 row to "copy rows to result" even if you filtered away all of them.
    Hint: Detect Empty Rows step will send a row of all nulls if no rows reach it.
    It would probably be a good feature ask ( http://jira.pentaho.com ) to have an optional "Clear Result Rows at initialization" on the "Copy Rows to Results" step

    I haven't dug deep into how they are implemented... I just figured out how they work - they are really only intended for taking rows of data between transformations where you have stages of work to complete. Trying to use multiple transformations outside of the "stage of work" concept wasn't how they were designed to be used, so you may struggle to get it to play nicely.

  4. #4
    Join Date
    Aug 2016
    Posts
    289

    Default

    That sounds good exactly how you describe it there. I created a jira ticket.

    It may very well not be intented for these sorts of things, but it is VERY common to have some sort of list/array in memory for a program and use this list/array many different places or even modify it. This together with loops is something which is often a struggle with in Spoon. And the exact behavior of the result rows is still a mystery not documented in any book or article.

    One solution as you say is to pass a null row. But this is a "dirty" solution, it requires null handling on the receiving end. A solution I went with was similar, I set a variable true/false depending of whether all result rows were discarded or not, and had a simple evaluation controlling the flow afterwards.

    Another solution is to have some place to dump the list (db table or file), but this is also messy. Data that should just be available inside the program's memory is now spread on disk and you need to handling and cleaning up too, in addition to much worse performance by loading to disk and back to memory.
    Last edited by Sparkles; 02-20-2019 at 03:05 PM.

  5. #5
    Join Date
    Aug 2016
    Posts
    289

    Default

    I made a ticket for jira here:
    https://jira.pentaho.com/browse/PDI-17926

    Got a reply saying that the issue was not reproduced. I added an example, but I'm currently using an older version (5.4.0.1-130). Would basic jobs and transformations be forward compatible with version 8.2? No fancy steps here, no database connections, only basic job, transformation and steps.

    If anyone is using 8.2 or some other "fresh" version, please try out the example that produces this unwanted behavior:

    https://drive.google.com/open?id=11Q...oh3SH3jaIjGTgV

  6. #6
    Join Date
    May 2016
    Posts
    280

    Default

    Hi Sparkles,
    I've tested it with 8.2 and I reproduce the issue, so it's a safe example for Jira.
    Regards
    OS: Ubuntu 16.04 64 bits
    Java: Openjdk 1.8.0_131
    Pentaho 6.1 CE

  7. #7
    Join Date
    Aug 2016
    Posts
    289

    Default

    Thanks a bunch!

    With some useful tips from you guys I managed to go around the problem, either checking for <null> rows or setting a variable if no rows were copied to result. Now I am aware of the behavior and can "control" it, but I think it would be useful to improve this behavior in future versions.
    Last edited by Sparkles; 03-22-2019 at 07:50 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.