Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Group By / Return Value cannot be found

  1. #1
    Join Date
    Aug 2016
    Posts
    290

    Default Group By / Return Value cannot be found

    I have a unique situation which is quite complicated, but let's give it a try. This applies to a transformation I made which reads fact rows from files, applies various filters and calculations, then generates statistics using Memory Group By. This transformation handles big data, typically millions of rows. The summaries calculated are finally inserted into a fact table.

    This transformation splits into multiple sub-streams as described above, each sub-stream writes to a unique fact table. So the contents from the file are maniuplated in multiple ways and written to multiple fact tables.

    The bug:
    There is a bug in "Memory Group By" step. It will log "Return value <random field name> can't be found in the input row.". Problem with this is, the field that can't be found is some random temporary field used earlier, but irrelevant and not referenced by the Memory Group By step. Perhaps due to some race condition or otherwise, some step/row meta-data is leaking. This error will periodically happen, perhaps a few times per day. The program is however executed every 5 minutes, exactly 288 times per day (12 5-min intervals per hour, multiplied by 24 hours). So the crash frequency is ca 1%.

    Relevant links describing the bug (no solutions):
    https://forums.pentaho.com/archive/i...p/t-95727.html
    https://stackoverflow.com/questions/...-the-input-row
    https://forums.pentaho.com/threads/2...the-input-row/

    The gateway step: Some of the sub-streams must be enabled using a variable. This is achieved using Java code step. Let's call it the "gateway step". So if the variable is "false", the gateway step will disable the sub-stream. The purpose of the gateway step is simple: Either allow rows to pass depending on the variable or stop all rows. The error above only happen when the sub-stream is disabled, meaning no rows are even passed to Memory Group By!

    The gateway step must return true to avoid rows filling up. PDI can only handle a certain number of "intermediate" rows between steps before crashing. So the gateway step will return true to avoid this. But what if it also calls "setOutputDone()" to quickly and immediately disable all steps in the disabled sub-stream?

    I have difficulties finding detailed description of what exactly "setOutputDone()" does, but I'll try and see if the error disappears. Due to the low crash frequency it can take some time to see any difference.

    Anyone have ideas here?
    Last edited by Sparkles; 05-27-2019 at 07:50 AM.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Not that it's a lot of help, but...
    https://github.com/pentaho/pentaho-k...tep.java#L2569

  3. #3
    Join Date
    Aug 2016
    Posts
    290

    Default

    Thanks! Always useful.

    Anyway, so far it seems like the combo of return true and setOutputDone() solved my problem. The bug hasn't happened again, I will keep monitor to see if it's gone for good.

    Also, the combo seems to be the right choice for stopping row congestion into the step and also immediately disabling steps that after.

  4. #4
    Join Date
    Aug 2016
    Posts
    290

    Default

    Turns out setOutputDone() did NOT stop the Memory Group By error. It still happens a couple of times per day randomly. It's still useful for immediately disabling steps. The Memory Group By error seems to happen before any data is processed.

  5. #5

    Default

    I don't see much details in the other threads that were referenced in the first post. Any chance you can re-create the issue with sample data or a sample KTR that can be shared?

    What version of PDI are you using? Are you running it using the Pentaho Server scheduler, or are you running it via Kitchen/Pan? If you're running via Kitchen/Pan, it's not likely to be a memory leak issue, as each instance of those utilities uses it's own Java process, and while there's still the chance of memory leaks, it won't "build up" across subsequent executions.

  6. #6
    Join Date
    Aug 2016
    Posts
    290

    Default

    Thanks for reply! Version used is 5.4.0.1-130. The transformation is executed by kitchen.sh which is scheduled and executed by a service, but it could just as well have been a cron job. This is not a memory issue no. The code executes without error at an hourly basis (generating hourly statistics), and that handles a lot more data and requires more memory than when the crash happens at minutely statistics generation (5 minutes worth of data instead of 60 minutes worth of data).

    Not sure if there's been any improvements in the Memory Group By step in newer versions.

    It is close to impossible to share and recreate this scenario because:

    1) High database dependency. This transformation references several fact tables and relational data tables.
    2) Raw data file dependency. This transformation normally reads huge files. The contents of these files are highly sensitive and I don't have any way of making dummy data. It could however be possible that the crash is regardless of file size.
    3) External java (.jar) library to read data files.
    3) The transformation is very complex. Maybe 70 or more steps, including a sub-transformation for complex logging.
    4) Low crash frequency. This is hard to recreate because the crash only happens 1, 2 or 3 times per day, and the program executes every 5 minutes (so 288 times per day in total).

    I can share the code so you can look at it, but because of the reasons above, it will be impossible to execute it.

    Main transformation:
    https://drive.google.com/open?id=1Mc...geVkHVVJi1DAoW
    Sub transformation (logging only):
    https://drive.google.com/open?id=1K0...LLDtSmS8kIC3KF

    To summarize the error/crash:
    For some reason one of the group by steps tries to reference a column which it can't find. However this column-name is only temporary and not used at all in the Group By step!

    By the way, the title of this tread is a bit of a side-track now. It worked great on its own, but did not stop or avoid the bug/error/crash.
    Last edited by Sparkles; 05-31-2019 at 11:10 AM.

  7. #7

    Default

    Yikes, I shouldn't have asked :P

    The error message did sound familiar, and I was able to come across PDI-16969, which looks like it may fix the issue you're experiencing. There were some other changes since 5.4 that may also contribute to the fix, so it's probably not just this one change that does it.

    I'd recommend trying out PDI 8.1 in a dev environment, as that may resolve the issue you're seeing.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.