Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: spoon Insert/Update performance using SQR for bugzilla

  1. #1

    Default spoon Insert/Update performance using SQR for bugzilla

    My organization is currently evaluating the pentaho BI suite and as part of this evaluation we are trying to get the sqr for bugzilla application(written by Nick Goodman -available here http://sourceforge.net/projects/qareports) to work on our own bugzilla database.

    While the transformations and jobs work (with little or no modifications), they seem to take ages to run. I am sure there is something wrong in the way I have set up the application, for example the attached transformation (bugz_int_issue_step2.ktr) takes ~24 hours to run . The step taking the longest to run is the update and delete steps (with speeds of around 0.5 records/second).

    I have tried -
    1.enabling and then disabling result streaming (neither makes a difference)
    2.enabled database caching in the database lookup step
    3.played with different settings for commit size on the insert

    none of these seem to work. I am a newbie to kettle -is there something basic that I should try?

    I am running pentaho-1.7GA -Linux with a mysql database. The total number of records is less than 300,000.
    Attached Files Attached Files

  2. #2
    DEinspanjer Guest

    Default

    One other thing you could try is going into the transformation settings (Ctrl-T) and on the Misc tab, select the option to use one unique connection for all steps.
    I would suspect with performance that low, you might be running into very bad locking issues or something.

  3. #3

    Default

    Thank you for your suggestion, I enabled "use unique connections" in the transformation settings and the update operation is still at 0.5 records/second.

    Quote Originally Posted by DEinspanjer View Post
    One other thing you could try is going into the transformation settings (Ctrl-T) and on the Misc tab, select the option to use one unique connection for all steps.
    I would suspect with performance that low, you might be running into very bad locking issues or something.

  4. #4

    Default

    Thanks for suggesting it might be a row locking issue- I just switched my engine from Innodb to MYISAM and my updates are 7 times faster. However, at 3.5 records/second and half a million records this transform still takes a lot of time. Is there anything else I can try within Kettle, or is 3.5 r/s the average speed for lookup/inserts? Thanks!

  5. #5
    Join Date
    Jul 2008
    Posts
    5

    Default

    Quick question - have you set up any indicies on the two fields that you are doing a lookup on ?

    To give you some idea of the inpact , without indexing mysql on a laptop under kettle reads about 150 row/s. With indexing ~2000 row/s .

    Having had a quick look at your update - do you have a multi-column index on the fields ISSUE_NAT_ID and ISSUE_ACTION_ID ?

    Cheers
    Justin

  6. #6

    Default

    nope, I didnt have an index on these two fields - thinking they would slow the insert operation. I just enabled it and my entire table process in 6 seconds. Thanks a million hockeychap! you totally rule!

    Quote Originally Posted by Hockeychap View Post
    do you have a multi-column index on the fields ISSUE_NAT_ID and ISSUE_ACTION_ID ?

    Cheers
    Justin

  7. #7
    Join Date
    Jul 2008
    Posts
    5

    Default

    You're welcome - just glad I could help

    Justin

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.