Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: How to optmize Table Input?

  1. #1
    Join Date
    Dec 2017
    Posts
    4

    Default How to optmize Table Input?

    I have started working with PDI a few days ago after a few months working with SSIS. I have also switched to PostgreSQL.

    Working with SQL Server and SSIS, I would optimize a few parameters to get a nice and fast flow of data down my ETL. Pentaho seems to work fast and like a charm on transformations, but I feel there is a huge bottleneck on my Table Input, running at 1200 r/s.

    Implying this is not a PostgreSQL issue, could somebody point to a direction on how to properly set up my connection and settings so I can Extract my data faster?

    Thanks in advance.

    Name:  javaw_2017-12-19_21-20-50.jpg
Views: 56
Size:  13.6 KB

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    If I had to guess, I'd say the bottleneck is at "Save to Staging" not "get transactions" since it has 10000 rows waiting to be processed, while only 9000 rows that it has processed that are waiting to be processed.

    I don't know how SSIS processes, but each step of a transform will pause processing if its output queue (things it has processed, but the next step hasn't yet taken to process) reached the row limit (that is a transformation setting... default is 10000 rows)

    If you really want to know if your "get transactions" step is slow, connect it directly to a dummy step, and see what the processing speed is. That is the speed for Postgres to retrieve the row, and send it over the network to your PDI machine. If Postgres and PDI are close (by network standards - example the closest they can be from a network perspective is DB is at 127.0.0.1 relative to PDI) the network won't be your bottleneck. If they are far (eg. a Transcontinental link between the DB and PDI), then the network could be the bottleneck.

  3. #3
    Join Date
    Dec 2017
    Posts
    4

    Default

    Thank you for the input. I am gonna try that tomorrow. I am also gonna try to setup Carte to work on the same network as the servers and maybe optimize network speeds.

    Sorry to hijack my question with another question: do you happen to know how to setup Carte to work online? I am using Azure + Windows Server and would love to run my requests on a server that happens to be in the same network as the database.

    Thanks for the help once again!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.