Hitachi Vantara Pentaho Community Forums
Results 1 to 12 of 12

Thread: Memory error

  1. #1

    Angry Memory error

    Hi all,

    I need your help as always:

    Finally I 've done a JOB that generate a SOAP request to Oracle CRMoD, get the XML, parse it and save it to a table.

    These are the operations:

    1. Truncate target table "B"
    2. Evaluate if there are any record to be requested in the table "A"
    3. Trasformation that take the 1st record from table "A" and do a SOAP request and save the collected data in table "B"
    4. Update of table "A" of the processed record.



    Now here is the problem:
    After 272 processed (and always after 272 processed records) I receive a JAVA Out of Memory error.
    I've tried to modify the script Spoon.bat in the following way:
    Code:
    if "%PENTAHO_DI_JAVA_OPTIONS%"=="" set PENTAHO_DI_JAVA_OPTIONS="-Xms6g" "-Xmx6g" "-XX:MaxPermSize=4g"
    Of course with no luck!
    This wouldn't be a problem... the MAIN problem is that the process keeps running until I have another error related to the GC Memory, then process dies.
    At the moment I've set up the same job with a twist...
    The Job counts how many records had been processed and after 270 there is another evaluate step that do an ABORT on the current job.
    Then I've cheated on Windows Task Scheduler to launch the process every minute (of course if the job isn't already running).

    Does anybody has a better approach ?


    Thanks

    P.S. If you need any help with the SOAP request just ask.
    Last edited by alessio.missio; 05-28-2018 at 12:23 PM. Reason: missing intro

  2. #2
    Join Date
    Aug 2016
    Posts
    290

    Default

    I'm not sure loop is a good thing inside a job/transformation or if kettle is supposed to work with infinite loops.

    1) you have a (possible infinite) loop
    2) each step makes a new row object based on the input row

    So I guess with this loop you keep creating more objects than what garbage collection can remove. A couple of things I have noticed:

    1) With user defined java step, it is possible to pass the original input row as output without making a new object. This would maybe not be feasible for handling transactions with external resources. But if you were able to access the sql library, you could write either write the loop inside the code or pass the same row object between steps.
    2) What about inserting a wait step which waits for some seconds?

  3. #3

    Default

    Hi Sparkles, thank you for your reply!

    1. Infinite loops: this issue is impossible, I've tested the procedure and whenever a record is processed usually I delete it. (In this case I update its processing flag).
    2. Yes I think so, but I don't have so much record to retrieve using a SOAP request. (They are 230K)

    About the solution you suggest:

    1. This could be a solution! Sadly I'm not a JAVA programmer and I don't have time to learn to use JAVA and also learn how to access to the SQL library and deal with this.
    2. I've inserted a wait step of one minute, but with no luck at all. I know that is a problem related to the memory consumption and if I could run a GARBAGE COLLECTOR with a step it would be great!

    If you have any other ideas/approach please feel free to post more.

    P.S. Oracle CRMoD is a crap (I'm joking)
    Last edited by alessio.missio; 05-29-2018 at 03:23 AM.

  4. #4
    Join Date
    Aug 2016
    Posts
    290

    Default

    Maybe you could move the entire operation to a transformation and have a step continously output rows which will then trigger subsequent steps? And let the rows come to an end (and die there) so you won't have the loop.

    Sorry I can't help much more on this topic, but I can add that trying to control garbage collecting in java is futile, you will hardly be able to enforce this, only suggest it in java. For full memory handling, you would need C++ or similar.

  5. #5
    Join Date
    Jul 2009
    Posts
    476

    Default

    I have a better approach.

    Instead of looping, edit the Start job step and check the Repeat box. You can set how often you want the job to repeat. I set mine to "Interval" with a 1 second delay, so every time the job finishes, PDI waits one second and then starts the job anew. You can set a delay that works best for you.

    When I tried looping the way you did, I got the same out-of-memory errors, but repeating the job fixed that problem.

  6. #6

    Default

    Hi robj,

    thanks for your idea! I'll try that today, this is a really good approach!

    I'll let you know.

    Thanks again

    A.

  7. #7
    Join Date
    Aug 2016
    Posts
    290

    Default

    Nice, I never noticed that one before! How would you shut off this program if executed by kitchen? Find the PID and kill it?

  8. #8
    Join Date
    Jul 2009
    Posts
    476

    Default

    I have an "Evaluate rows number in a table" step near the end of my job. It looks for a row in a database table that tells the job to keep running, like this:

    Code:
    select * from my_table where keep_running = 'Y'
    In that step, I have "Success when rows count" = "Greater than" and Limit = 0, so it is successful whenever the query finds the row. The success hop leads to a Success step, which causes the job to repeat. The failure hop leads to an Abort job step, which ends the job. I have a script that updates the database row, changing keep_running from 'Y' to 'N', and when I want to end the job, I run the script.

    Using an Abort job step probably means that the PDI command gives a non-zero return code, indicating an error. There could be a more elegant way to do this, but what I have works for our needs.

  9. #9
    Join Date
    Aug 2016
    Posts
    290

    Default

    Yes that's one solution, but it would also make an error log each time you exit the loop with an abort (which in some situations could be noisy in a similar way as you mention with the non-zero return code).

    I must say robj's solution was truly better and elegant.

  10. #10
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Or simply copy the rows to the results, and let PDI walk the result set...

  11. #11

    Default

    Hi all,

    just to give you an update.

    I've tried the solution proposed by robj and it worked.

    I didn't have memory issue anymore! And this is outstanding.

    The only "bad" thing is that I need to keep PDI always open and I cannot submit the loading JOB as a batch.

    But it's a very good solution!

    Thanks again

    A.

  12. #12
    Join Date
    Nov 2009
    Posts
    688

    Default

    Quote Originally Posted by alessio.missio View Post
    The only "bad" thing is that I need to keep PDI always open and I cannot submit the loading JOB as a batch.

    A.
    It shouls also work when the job is started in a batch by Kitchen

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.