Hitachi Vantara Pentaho Community Forums
Results 1 to 18 of 18

Thread: For no reason, job stops responding. Process still running with no activity!!!

  1. #1

    Default For no reason, job stops responding. Process still running with no activity!!!

    This seems to be a reoccuring problem with my jobs.

    Every once and a while, my jobs seems to stop responding with no more activity in the log.

    Here's an example of the log output which is about 40% done in terms of volume to process:

    INFO 01-02 10:05:11,423 - stp_RespondentSTG_tfi - linenr 7250000
    INFO 01-02 10:05:11,540 - stp_ConvertAttr_js - linenr 7250000
    INFO 01-02 10:05:11,646 - stp_CountryID_lkp - linenr 7250000
    INFO 01-02 10:05:11,650 - stp_StateID_lkp - linenr 7250000
    INFO 01-02 10:05:11,651 - stp_PersTypeID_lkp - linenr 7250000
    INFO 01-02 10:05:11,655 - stp_mainSpecialtyID_lkp - linenr 7250000
    INFO 01-02 10:05:11,659 - stp_subSpecialtyID_lkp - linenr 7250000
    INFO 01-02 10:05:11,659 - stp_SourceID_lkp - linenr 7250000
    INFO 01-02 10:05:11,662 - stp_ToRecruitID_lkp - linenr 7250000
    INFO 01-02 10:05:21,225 - stp_RespondentSTG_tfi - linenr 7300000
    INFO 01-02 10:05:21,406 - stp_ConvertAttr_js - linenr 7300000
    INFO 01-02 10:05:21,514 - stp_CountryID_lkp - linenr 7300000
    INFO 01-02 10:05:21,516 - stp_StateID_lkp - linenr 7300000

    Here's the mapping which is causing a problem right now:

    Name:  job_hang.bmp
Views: 191
Size:  218.7 KB
    Last edited by benjaminleroux; 03-06-2014 at 06:05 PM.

  2. #2

    Default

    Anyone? I'm I the only one that has this problem? Could it be my Java version? Or the fact that Business Objects is running on the same machine (it also uses Java)? Or that I use the Unix version?

    This problem is really crippling!!!

  3. #3
    Join Date
    Nov 2008
    Posts
    143

    Default

    It's kinda hard to evaluate such problems without having the same environment, data and jobs.
    Would it be possible for you to test it in another OS and see if the problem persists?
    BTW, which PDI version are you using?

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Could be a lot of reasons including a stalling database, a lock somewhere caused by a transaction.
    Perhaps something went wrong in the "user defined java class" steps?

    If this forum can't help you perhaps you should contact Pentaho support for more personalized help.

  5. #5

    Default

    We're on version 4.0.1.

    with Java version:
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    I've tried it on 3 different environment (all Linux based). The problem is sporadic at best as it doesn't always happen.

  6. #6

    Default

    Quote Originally Posted by MattCasters View Post
    Could be a lot of reasons including a stalling database, a lock somewhere caused by a transaction.
    I have been monitoring our source database which is mostly SQL Server 2000 and 2005 and could not find any locked threads.

    Quote Originally Posted by MattCasters View Post
    Perhaps something went wrong in the "user defined java class" steps?
    My user defined Java class does this (its the same for all of the udjc in this mapping, just different columns are used):

    org.apache.commons.codec.digest.DigestUtils.md5Hex((new StringBuilder(first_name+"")).append("|").append(last_name+"").append("|").append(email_address+"").append("|").append(country_id+"").append("|").append(state_id+"").append("|").append(person_type_id+"").append("|").append(main_specialty_id+"").append("|").append(sub_specialty_id+"").toString())

    Could this cause a stale?

    Thanks,
    Benjamin

  7. #7
    Join Date
    Nov 2008
    Posts
    143

    Default

    Sounds "java-ok" to me.
    I'm not familiar with unix, so wild guess here, hows your system monitor (RAM, pagination, free disk space)?

  8. #8

    Default

    Our server has 16 cores, our RAM has been bumped up to 128 gigs and we have 500gb of space for our ETL.

    I allocated 8gb of RAM to Pentaho /Java (in kitchen.sh)... Tried to allocate more but noticed that the problem got worse.

    We have normal spikes in CPU when doing CPU intensive processes (like loading into MySQL) and rarely swap on the DB.
    Last edited by benjaminleroux; 02-02-2011 at 11:41 AM.

  9. #9
    Join Date
    Nov 2008
    Posts
    143

    Default

    I guess we can rule that one out, then! hehehe

    Have you tried gradually testing the transformation?
    Say, considering you have 10 steps from 0 to 9, run the transformation leaving only the hop from 0 to 1 enabled and disable hops from 1 to 2, 2 to 3 and so on.
    After that, enable hop from 1 to 2, leaving the others after that one disabled and run the transformation again.
    Repeating the process until every hop is enabled.
    Maybe this enables you to see if it locks in any specific step, or cpu/memory usage.

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    For these large boxes, make sure to get on the very latest Java version. I do recall having had some other person having a Sun Java Runtime stall issue.

  11. #11
    Join Date
    Feb 2009
    Posts
    296

    Default

    What happened to me a couple of times:

    Using a stream-lookup step there is a situation when the input/output-buffer of the step is full before all the loopuk-values are read. This would make the stream-lookup step wait for the lookup values to finish. However these values will never arrive as they are blocked due to overflowing buffers.

    You can see this if you watch the stats in spoon.

    I think it only happens if you use the same stream for lookup and input (self-referencing thingy)
    Fabian,
    doing ETL with his hands bound on his back

  12. #12

    Default

    Well, this is getting weirder and weirder.... After stripping down my mapping and trying multiple permutation, I could not identify the root cause. And, to top it all off, the original mapping is working now.

    However, the following mapping (which is extremely simple in nature) got stuck this morning. This mapping takes data out of a SQL Server table, sorts it, and dumps it to a file. The reading of the file was completed, along with the sort, as per the following log entry:

    ...
    INFO 09-02 08:45:31,371 - stp_ResearchRecipient_ti - linenr 27450000
    INFO 09-02 08:45:31,372 - stp_ResearchRecipient_srt - Linenr 27450000
    INFO 09-02 08:45:31,475 - stp_ResearchRecipient_ti - linenr 27500000
    INFO 09-02 08:45:31,551 - stp_ResearchRecipient_srt - Linenr 27500000
    INFO 09-02 08:45:31,729 - stp_ResearchRecipient_ti - Finished reading query, closing connection.
    INFO 09-02 08:45:31,730 - stp_ResearchRecipient_ti - Finished processing (I=27529406, O=0, R=0, W=27529406, U=0, E=0)
    ...

    But then stopped writting out to the file (and yes, plenty of space available):

    ...

    INFO 09-02 08:45:59,614 - stp_ResearchRecipient_tfo - linenr 7300000
    INFO 09-02 08:45:59,793 - stp_ResearchRecipient_tfo - linenr 7350000
    INFO 09-02 08:45:59,973 - stp_ResearchRecipient_tfo - linenr 7400000

    [last log entry]

    Heres my mapping :

    Name:  extract.bmp
Views: 139
Size:  208.8 KB

    I am very hesitant on upgrading my Java in PROD.. but is this the only thing I can try?

  13. #13
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Did you upgrade your Java Runtime Environment?

  14. #14

    Default

    Quote Originally Posted by MattCasters View Post
    Did you upgrade your Java Runtime Environment?
    This is a last step in troubleshooting for us. Last time we changed the version of Java, many things became unstable... so we're a bit more careful to make the change.

    But if no other solution is brought forth, we will. What is Pentaho's suggested Java version?

  15. #15
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    All I did is comment that I've seen a hang on a big multi-core fixed by upgrading to the latest 1.6 point version. If you want Pentaho support, I guess you should contact them.
    Unzipping a recent JRE in a directory somewhere shouldn't be too hard. Nothing will "become unstable".

  16. #16

    Default

    HI Matt,

    Your intuitions were bang on. Upgraded Java to the latest JRE and we have not had a stall in 3 days.

    We are now able to run 5 independent threads of Kitchen, each with 8g of RAM for JAVA with no problems.

    Thanks again,
    Benjamin

  17. #17
    Join Date
    Mar 2006
    Posts
    170

    Default

    Hi Benjamin,

    I am curious as to the point version you updated to?
    Would you mind sharing?

    Also did you try and max the RAM that PDI is given now?

    If possible could you experiment with the RAM setting ... we have a similar env. as yours and have the RAM at:

    -Xmx40960m

    I'm wondering if we too would get better performance by pulling it down.

    I guess I always say BIGGER IS BETTER ... MORE RAM, MORE OF EVERYTHING!

    Perhaps I'm wrong...

    Thanks

    Kent

  18. #18

    Default

    Hi Benjamin,

    Quote Originally Posted by kandrews View Post
    I am curious as to the point version you updated to?
    Would you mind sharing?
    We went from build 1.6.0_17-b04 to build 1.6.0_24-b07

    Quote Originally Posted by kandrews View Post
    Also did you try and max the RAM that PDI is given now?

    If possible could you experiment with the RAM setting ... we have a similar env. as yours and have the RAM at:

    -Xmx40960m
    We went from 2gb to 4gb to 8gb to 20gb.... and found that 8gb was giving us good results and more importantly, no garbage collector errors and not out of memory error. Have it lower allows us to run more parallel instances of kitchen.

    Thanks,
    Benjamin

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.