Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Jobs in Parallel

  1. #1
    Join Date
    Sep 2008
    Posts
    15

    Question Jobs in Parallel

    Hi,

    I'm trying to run jobs in parallel this time (Kettle- Spoon version 3.0.4). Have read the wiki page by Matt and have tried 2 approaches -

    Job A pointing to Job B and Job C
    Job A pointing to Transform B and Transform C

    My understanding of what should happen was, A wiould kick of B and C parallely. However, the log files do not indicate that, it's B followed by C!

    Also, with or without enabling the launch parallel option,the run time for these jobs seems the same. Am I missing something?

    Hoping to hear back soon.

    Thanks a lot,
    Sowj.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Attach your jobs

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    This is expected and documented behavior.

    http://wiki.pentaho.com/display/EAI/...es+in+parallel

  4. #4
    Join Date
    Sep 2008
    Posts
    15

    Default

    Thanks for the quick repsonse Sven and Matt.

    I'm attaching 2 jobs here. Load_product gets called in the 1st job and classifier is another similarly loaded table. These tables are not related in any way and there are many more such tables that I'd like to load in parallel, but was testing with 2 of them to begin with, to test out the functionality.

    Think my biggest concern here is time, like I mentioned in my earlier post, it takes the same amt of time with or w/o parallelism!

    Thanks a ton!
    -Sowj.
    Attached Files Attached Files

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The new acronym to use these days is RTFW I think. (Read The Fine Wiki)

    The execution model cited above makes it harder to to execute a certain number of job entries in parallel and then simply continue with something else in sequence.
    To do this, we suggest you wrap up the parallel work in a separate Job.
    As such, if you have 2 job-entries you want to run in parallel : A and B, you simply create a new Job with a Start entry followed by A and B in parallel.
    Call that job C and use it wherever you want like in a job any other job entry.

    Matt

  6. #6
    Join Date
    Aug 2009
    Posts
    17

    Default

    I have been trying to understand this issue a little more and am afraid that I have yet to feel comfortable with its true parallel'ness (is that even a word?!).

    My job is structured as documented via the Wiki and yet, I have not noticed a significant change in processing time. My fear is that if sub jobs have more than one step, or you reuse a job / transformation for the parallel steps passing diffrenet parameters, they all get processed round robin style.

    For example... jA has 10 child jobs jB1-10 (set to run in parallel).
    Jobs jB1-10 are actually the same job with slightly different parameters passed to them. Within the job, there are a number of steps to pick up a given file (passed via parameter) and conduct a Bulk Insert Into mySQL.

    From what I can tell, by observing the local filesystem as well as logs and mySQL, only one file is getting processed at a time.

    Is this because I'm trying to re'use the same job step ten times?
    01010100011010000110000101101110011010110111001100100001
    Richard
    @nitrohawk.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.