Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to increase database throughput

  1. #1
    Join Date
    Jan 2014

    Default How to increase database throughput

    I have a transformation which loads data in the range of 60 - 80 million rows.
    When run individually it gives a throughput of more than 7000 rows/sec. The transformation has to be run for about 70 source tables each holding data in the range of 60 - 80 million rows. When run in parallel the throughput falls to less than 1000 rows/sec ( 6 transformations were run in parallel ).

    Below are the settings that are currently used:

    - In database connection defaultRowPrefetch = 200
    - In transformation properties 'Nr of rows in rowset' is 50000
    - In JAVAMAXMEM="2048"
    - In server/data-integration-server/, CATALINA_OPTS="-Xmx2048m
    - The JDBC driver used is ojdbc6.jar

    There are no indexes on the target table. The database is Oracle. The server has 4 cores.
    Please advise on what else can be done to increase performance.

  2. #2
    Join Date
    Apr 2009


    - launch multiple copies of the table input step and try to monitor the performance change
    - Increase the JDBC fetch size for the DB connection
    - While running in parallel, are you running a separate kitchen/pan script for each job/transformation execution ? or is the transformation run from carte/enterprise repository?

  3. #3
    Join Date
    Jan 2014


    Tried some scenarios with launch multiple copies. Got some issues.
    Have started a separate thread on the issue ' Rows duplicating on "change number of copies to start" in table output step '

    The transformation is not run from carte/enterprise repository. Tried out 2 cases to run in parallel.

    1. Invoked a parametrized job inside a transformation. Used the 'Job Executor' step to launch 6 entries. Throughput was bad.
    2. Launched 6 pan scripts one after the other. Throughput was less than 1000 rows/sec

    The throughput in the 2nd case was better than 1st

  4. #4
    Join Date
    Jul 2009


    If you're just inserting data into the table, and the insert step is the bottleneck, you should look at the Oracle Bulk Loader ( if you're not using it already. I use the Postgres Bulk Loader because we use Postgres, but the idea is similar - inserting rows is faster when you tell the database to do it in a bulk operation.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.