Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Load process is very slow...any suggestions to improve performance?

  1. #1
    Join Date
    Dec 2015
    Posts
    21

    Default Load process is very slow...any suggestions to improve performance?

    Hi everyone,

    I have a transformation that's loading data from one Redshift db source table to another Redshift db target table.
    I am doing full load/initial load now. For loading 100K it is taking 2 hrs. I need to load some tables with 20 Millions of records.
    The performance is very slow.

    Since it's initial load, it's done manually and no job or scheduling required.
    So, I created the transformation in spoon and moved it into unix server and running the process in server side.
    I have attached the process flow below. I know the Redshift db is big data and nodes can cause slowness while writing into the db.

    Is there anything I can do in the Pentaho setting that can improve the performance of the process.

    Thanks,
    Raji.
    Attached Images Attached Images

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Hi,

    Don't know redshift but:
    1. In join row, set "main step to read from" as your table input step
    2. Desactivate indexes or constraints checks on target table (if applicable)
    3. If you can split input table into multiple parts, load each parts in parallel to multiple targets table
    then merge everything with insert into select from.

    But 100k rows in 2hrs seems reeeaallly slow, so check your Db configuration and jsbc drivers.
    Also maybe increase the rowset size of transformation if it is small.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by RajiR View Post
    running the process in server side.
    Unless you are running it at Amazon, you are bringing your entire dataset down to your local server, processing it, and sending it back to Amazon.
    You *will* incur both time and monetary costs by doing that.

    Maybe use the S3 output to push it all to a S3 node, and then follow Amazon's advice to COPY into your table ( http://docs.aws.amazon.com/redshift/...a-from-S3.html )
    Last edited by gutlez; 12-30-2015 at 06:42 PM.

  4. #4
    Join Date
    Dec 2015
    Posts
    21

    Default

    Thanks for information.I will try the same.
    Raji.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.