Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Job Performance

  1. #1
    Join Date
    Dec 2006
    Posts
    15

    Default Job Performance

    I am seeing a drastic difference in performance between running a transformation alone vs running the same transformation within a job. Here is the simplified version of what I am doing.

    The transformation does the following:
    1. Uses a table input to select data from a mysql database. When run natively the sql performs very fast. With in the transformation it is slower but livable.
    2. Passes the output to another table input to select a finer grain of detail.
    3. Then uses table output to write data to a table
    All this will happen in around 1 minute.

    The job adds the following:
    1. Uses table input to get a small list of groups (at most 7) to run the above transformation
    2. Passes the output and sets variables
    3. For each row it gets variables and runs the above transformation. What was hard coded in the transformation when run alone is now replaced with variables. These are mostly where clause fields in the select.
    4. Repeats till done
    This takes around 10 minutes with sporadically taking much longer.

    I am confident that the sql is tunned which is proven with the performance on the transformation when run alone. What would make the transformation in the job run so much slower? Could it really be the setting of variables? Is there something I am missing with creating jobs? Some settings not set right?

    Thanks for your help,
    Jason

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Just guessing... variables change your SQL and your database needs extra time to "prepare" the queries. If your variable are set the same in sequential runs, the first run would be slower while the next ones would be faster again (if the SQL preparation is cached).

    Variable setting and getting is peanuts timewise.

    Regards,
    Sven

  3. #3
    Join Date
    Dec 2006
    Posts
    15

    Default

    Hi Sven,

    Thanks for the reply. What you are saying makes sense but I am not sure it would explain a 10 fold slow down. I am not sure I made myself clear in my initial post. The stand alone transformation is taking about a minute and each individual time the transformation is run within the job it takes 10 minutes so a total of 70 minutes for all 7 groups. 70 minutes is a bit of an exaggeration because some groups have much less data and finish faster but either way the job is slower than the transformation.

    Thanks again,
    Jason

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    aah ... I thought 7 + 3 = 10 minutes... not 7 x 10

    What about I/O... Do you run the "fast query" and PDI on the same server? A lot of the bottle necks I see are IO-based.

    Regards,
    Sven

  5. #5
    Join Date
    Dec 2006
    Posts
    15

    Default

    Hi Sven,

    Thanks for the quick reply. I appreciate the help. Everything is run on the same server so all should be equal. I will dig deeper on the I/O though.

    Jason

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    Next try ... delete anything in the transformation except your table input.... other hunch is that you're taking a lot of rows and this takes some time to create... although as of 3.x it should have very much improved.

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.