Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Kitchen starting second job kills first job

  1. #1
    Join Date
    Jul 2016
    Posts
    5

    Default Kitchen starting second job kills first job

    I have two jobs that I schedule using crontab. Both jobs are started with a common shell script using the below syntax. The jobs do not have the exact same start time, there is a one hour gap but the two jobs do overlap (first does not finish before the second starts). While testing I had the start times only offset by one minute.

    ${dataIntegration}/kitchen.sh /file:${pentahoJobs}/${loadObject}/src/${startJob} /level:${jobLogLevel} 1>${pentahoJobs}/${loadObject}/logs/stdout_${startJob%.*}_${dateTime}.txt 2>${pentahoJobs}/${loadObject}/logs/stderr_${startJob%.*}_${dateTime}.txt

    Once the second job starts the first job is killed. Only message/log that I am finding is from the stderr output.

    /data-integration/spoon.sh: line 219: 62803 Killed "$_PENTAHO_JAVA" $OPT -jar "$STARTUP" -lib $LIBPATH "${1+$@}" 2>&1

    From stdout output it seems that karaf is properly starting both instances. So it is not clear how/why the first job is being killed. I am looking for any suggestions for logging to better identify the root cause of the failure or for configuration to allow multiple jobs to process on the same server at the same time.

    14:39:02,965 INFO [KarafInstance]
    *******************************************************************************
    *** Karaf Instance Number: 1 at /data-integration/./system/karaf/caches/def ***
    *** ault/data-1 ***

    *** Karaf Port:8802 ***
    *** OSGI Service Port:9051 ***
    *******************************************************************************

    14:40:03,513 INFO [KarafInstance]
    *******************************************************************************
    *** Karaf Instance Number: 2 at /data-integration/./system/karaf/caches/def ***
    *** ault/data-2 ***

    *** Karaf Port:8803 ***
    *** OSGI Service Port:9052 ***
    *******************************************************************************

  2. #2
    Join Date
    Aug 2016
    Posts
    11

    Default

    You might be simply running out of memory once the second JVM pops up? Try checking memory usage right before the second job starts.

  3. #3
    Join Date
    Jul 2016
    Posts
    5

    Default

    Thank you for the quick reply. I had thought I had my settings to notify me with responses so I neglected to check back.

    It is possible to be a memory issue. The job consumes most of the system memory during execution. What I find odd is that the second job kills the first - I would expect the second to fail to start if there were not enough resources. And if the first is failing for memory I would expect a message in the log similar to the disk full messages instead of a "kill" message in the stderr output. Finally, if hitting the limits of memory how does the second job started continue to execute - is it really killing the first job to get additional resources?

    Have you come across this before? How did you definitively identify memory as the root cause?

  4. #4
    Join Date
    Aug 2016
    Posts
    11

    Default

    Yeah we used to run into the same issue back when we ran scheduled jobs. I wrote a quick python script to log CPU and memory usage and that was indeed the problem. Mostly we fixed it by queueing jobs so that they don't overlap.

    I can think of a few solutions:
    - merging both jobs
    - changing the schedule
    - reducing kitchen's max memory usage so that a single JVM instance can't take up more than x% of your memory, though that'll probably make everything slower
    - queueing

  5. #5
    Join Date
    Jul 2016
    Posts
    5

    Default

    Thank for the additional detail and the Python suggestion. I have been researching solutions to shorten the execution time - to avoid the overlap. Originally, I had the jobs start times to avoid overlap but recent changes is causing the first job to run too long. Moving the second job later would have the second job finishing outside the acceptable window. I will post a follow up if/when I find a workable solution.

  6. #6
    Join Date
    Aug 2016
    Posts
    11

    Default

    you can save memory if you run both jobs inside a larger job. you can delay the execution of the second job by inserting a WAIT step before it.

    so if one job is scheduled for say 1am and the other for 2am:

    start -> begin job 1
    ...... -> (in parallel) wait one hour -> run job 2

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.