Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: how to improve performance of Table input/Table output step ?

  1. #1
    Join Date
    Sep 2012
    Posts
    27

    Default how to improve performance of Table input/Table output step ?

    I have a transformation with Table Input step (with the table having around 50 columns) and writing it to another remote DB in a Table output step.
    It is processing about 30,000 records and it takes around 3 minutes. How can i reduce the transformation time ? Increasing the commit size will help ?
    Any other parameters that can be tuned ?

    In table input step, these are the values i have .

    <size_rowset>10000</size_rowset>
    <sleep_time_empty>50</sleep_time_empty>
    <sleep_time_full>50</sleep_time_full>
    <unique_connections>N</unique_connections>
    <feedback_shown>Y</feedback_shown>
    <feedback_size>50000</feedback_size>
    <using_thread_priorities>Y</using_thread_priorities>
    <shared_objects_file/>
    <capture_step_performance>N</capture_step_performance>
    <step_performance_capturing_delay>1000</step_performance_capturing_delay>
    <step_performance_capturing_size_limit>100</step_performance_capturing_size_limit>
    In table output step,

    <commit>1000</commit>
    <truncate>Y</truncate>
    <ignore_errors>N</ignore_errors>
    <use_batch>Y</use_batch>
    <specify_fields>Y</specify_fields>
    <partitioning_enabled>N</partitioning_enabled>
    <partitioning_field/>

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    There must be some reason why you don't use bulk loading.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Sep 2012
    Posts
    27

    Default

    Quote Originally Posted by marabu View Post
    There must be some reason why you don't use bulk loading.
    Was not aware of this step ? Is there any example in handy ?
    In the bulk loader example, could see only the target table. Where do i specify the source table ?
    Thanks

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The bulk loader step is meant to replace a Table Output step, so you still need some input step.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Sep 2013
    Posts
    235

    Default

    Take as much RAM as you can get.
    Increase row set size for transformation (this will be BlockingRowSet size under the hood between 2 concurrently running threads: table input and table output). Try to set table output commit size as much as possible (try to load as much data to local memory as you can before output/release it in table output), turn off logging, turn off all metrics, run this on Carte server (not in desktop visual Spoon - jetty is less memory consuming than the Spoon swt)...
    Write the database types for input/output, write if all mentioned above will help...

  6. #6

    Default

    I don't see any mention of the database engine here, and the answers can vary greatly depending on the database and driver used.

    With MySQL I use rewriteBatchedStatements=true for table output, which helps immensely with throughput. It makes the bulk loader almost unnecessary.

  7. #7
    Join Date
    Sep 2012
    Posts
    27

    Default

    Thanks. I tried Bulk loader step. I do see tremendous improvement .
    A small question. In "Table output" step, I use "truncate table" option. How can i achieve this in bulk loader step ?

  8. #8
    Join Date
    Sep 2012
    Posts
    27

    Default

    Quote Originally Posted by jsturm View Post
    I don't see any mention of the database engine here, and the answers can vary greatly depending on the database and driver used.

    With MySQL I use rewriteBatchedStatements=true for table output, which helps immensely with throughput. It makes the bulk loader almost unnecessary.
    We use Postgres Engine

  9. #9
    Join Date
    Sep 2012
    Posts
    27

    Default

    Quote Originally Posted by lramamur View Post
    Thanks. I tried Bulk loader step. I do see tremendous improvement .
    A small question. In "Table output" step, I use "truncate table" option. How can i achieve this in bulk loader step ?
    Ok. I see that we have "Truncate" option which will first truncate and then insert

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Ignore the advise on increasing the row set buffer size as it usually has a negative impact, reducing parallelism, increasing memory usage.

    Custom SQL can be executed simply by adding a Exec SQL step anywhere in the transformation (unconnected to anything). The SQL in there will be executed at transformation initialization.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.