Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: experience with datawarehousing...slow

  1. #1
    Join Date
    Apr 2007
    Posts
    4

    Cool experience with datawarehousing...slow

    We used Kettle in a datawarehousing project using UDB v.8
    We initially used the Insert/Update step in to insert about 60K records into our fact table everyday.

    This became incredibly slow after just a few days. ( 60K records took about 3 hours)
    The number of select/insert/updates that are required to refresh our fact table was the bottleneck for the fact transformation process.

    We couldn't improve the load performance by changing various configurations e.g. commit size, buffer size, etc. on the insert/update step.
    The only way we could significantly improve the speed was by using data export step to export data to a staging table, and then use UDB commands to merge the data into the fact table.

    The new process takes about 3 minutes.

    We're perfectly happy with this stage-first, load-second process, but I'm wondering how other people have used kettle to refresh fact tables.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If it slows down, you're not using an index for some reason.
    Otherwise, the speed would be pretty much linear.

    The alternative which is typically around 3x times faster than that is to insert into the target table and use a primary key on that.
    Then set up error handling to catch the rejected records (Unique index violations) and send them to an Update step.

    Matt

  3. #3
    Join Date
    Apr 2007
    Posts
    4

    Default

    Thanks for the reply Matt,

    We've actually tried looking into making sure the right indices are created.
    Perhaps it's a UDB v. 8 issue, and it would interesting to see if the performance slowdown goes away in UDB .v 9

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I remember that sort of thing from Oracle too. There you have to run "ANALYZE table" statements to have it pick up the index or even manually mess with the statistics tables. Maybe it's something like that on UDB too. In that case, upgrading isn't going to help.

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.