Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to increase database throughput

  1. #1
    Join Date
    Jan 2014
    Posts
    25

    Default How to increase database throughput

    I have a transformation which loads data in the range of 60 - 80 million rows.
    When run individually it gives a throughput of more than 7000 rows/sec. The transformation has to be run for about 70 source tables each holding data in the range of 60 - 80 million rows. When run in parallel the throughput falls to less than 1000 rows/sec ( 6 transformations were run in parallel ).

    Below are the settings that are currently used:

    - In database connection defaultRowPrefetch = 200
    - In transformation properties 'Nr of rows in rowset' is 50000
    - In pan.sh JAVAMAXMEM="2048"
    - In server/data-integration-server/start-pentaho.sh, CATALINA_OPTS="-Xmx2048m
    - The JDBC driver used is ojdbc6.jar

    There are no indexes on the target table. The database is Oracle. The server has 4 cores.
    Please advise on what else can be done to increase performance.

  2. #2
    Join Date
    Apr 2009
    Posts
    337

    Default

    - launch multiple copies of the table input step and try to monitor the performance change
    - Increase the JDBC fetch size for the DB connection
    - While running in parallel, are you running a separate kitchen/pan script for each job/transformation execution ? or is the transformation run from carte/enterprise repository?
    Regards,
    Madhu

  3. #3
    Join Date
    Jan 2014
    Posts
    25

    Default

    Tried some scenarios with launch multiple copies. Got some issues.
    Have started a separate thread on the issue ' Rows duplicating on "change number of copies to start" in table output step '

    The transformation is not run from carte/enterprise repository. Tried out 2 cases to run in parallel.

    1. Invoked a parametrized job inside a transformation. Used the 'Job Executor' step to launch 6 entries. Throughput was bad.
    2. Launched 6 pan scripts one after the other. Throughput was less than 1000 rows/sec

    The throughput in the 2nd case was better than 1st

  4. #4
    Join Date
    Jul 2009
    Posts
    476

    Default

    If you're just inserting data into the table, and the insert step is the bottleneck, you should look at the Oracle Bulk Loader (http://wiki.pentaho.com/display/EAI/Oracle+Bulk+Loader) if you're not using it already. I use the Postgres Bulk Loader because we use Postgres, but the idea is similar - inserting rows is faster when you tell the database to do it in a bulk operation.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.