Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Performance w.r.t Time

  1. #1

    Question Performance w.r.t Time

    Hi All
    I have made a transformation with two things Table Input and Table output.

    The database is on remote location. The transformation is reading data from one table and writing to other table in the same database on the same host.
    The problem is that it taking too much time to process it. e.g I noticed that it takes about 7 minutes to just write 1000 rows.

    I want to know that is this usual time which kettle is taking?? Does anybody have any experience/benchmarks that kettle should process this much of rows in this time?

    Thanks in advance
    Shuja

  2. #2
    Join Date
    Jul 2009
    Posts
    17

    Default

    In your situation, the slow performance is not caused by kettle at all, it is probably due to the overhead of transferring data to/from your remote server over your network.

    Here are some suggestions:

    1. If you have indexes on the table, this will slow down speed as INSERT statements are processed. Can you remove them and then add them back in after the transformation?

    2. Try changing the values for "Commit size" and maybe "Use batch update for inserts" in the Table Output step. The default values are probably not bad though.

    3. Have you considered running the transformation on the remote server, perhaps using the command line Pan application which is included with PDI? This will remove the overhead of transferring data to/from the remote location. http://wiki.pentaho.com/display/EAI/...+Documentation

    4. Consider using a Carte server to run the transformation on the remote server. http://wiki.pentaho.com/display/EAI/...+Documentation

    5. Consider running the transformation through Carte or Pan on a server that is "closer" to your remote server, e.g. in the same datacenter.

    Of course there are also a whole bunch of network performance things you could do, such as trying to find the bottleneck and seeing if you can optimise your network for this data transfer.

  3. #3

    Default

    Rather than focusing on Kettle first, make sure you got your expectations right.

    Use your database tool of choice (mysql, ms enterprise manager, workbench/j, whatever), and connect remotely to the database just like Kettle is.

    Then, run the same scripts (select) and measure the performance using your tool. Then create and run a script doing similar inserts from your local box to the remote location.

    Although these quick tests wont be one-for-one between Kettle and the tool/database, you should find that your environment is causing issues that Kettle can not overcome.

    p.s. one other option on top of mfantcook excellent list is that if it is mysql or another database system that Kettle supports doing bulk-inserts (i.e. create a special-format CSV file and transfer that file to be local on the database), use that approach - measuring 'file transfer speed' is easily understandable and the actual raw insert on the database will be much better.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.