Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Pentaho Speed Problems

  1. #1

    Default Pentaho Speed Problems

    Hello,

    I'm currently working on a Pentaho project to replace a SQL databridge. The databridge constantly checks log tables and transfers any changes between (two way) two oracle databases with different structures. One side is represented by one database, the other side of the bridge is represented by 7 identically structured databases.

    We currently have the following set up:
    TransA - Get all the data for every row in the table, to be used as input for the next job.

    JobA - Run once for each row
    TransB- Set a variable for the database to use (used in a variable connection string).
    TransC - Perform inserts, updates and deletes, and retrieve required data stored on the variable database.


    Although this works correctly, it works VERY slowly. Because we have to process each record individually for the last step, it causes the application to perform poorly - about 0.6 records a second overall.

    Is there any way to deal with having 7 output databases that wont severely detriment performance?

    Any help would be really appreciated.

    Joshua

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    The effect of adding output databases is linear of course. The only way out is to parallellize your outputs, as of 3.1 you can with some nifty things also parallellize jobs, but you would probably need an extra job layer for it.

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Sven, that's beside the point. The point is you can't replace a data replication tool with a data integration tool and expect the same performance results.

    I would have expected a hammer/nail answer from you here since the two are radically different.

    The trick in speeding up the data replication is in the efficient detection of the changes that happened to a database in a given timeframe.
    Depending on the configuration that can or can not be done efficiently.

    Matt

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    lol ... sometimes we let one escape without a hammering

  5. #5

    Default

    Thanks for the replies,

    Matt - Luckily the database has been designed for this, and we have a log file we can rip changes out of pretty easily.

    Sven - Thanks, I'll look into 3.1. We're currently using 3.0, so I'll try using an added job layer first as a preferred solution.

    Cheers,

    Joshua

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.