Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Spoon : how to improve performance ?

  1. #1
    Join Date
    Sep 2005
    Posts
    1,403

    Default Spoon : how to improve performance ?

    Hi,


    I've to download (from db2-400) & upload (to Oracle 9i) large data files (>1 million record) with many columns.



    I found Spoon very slow to process these files and I consider to maintain my old fashion tools for this kind of process (pl/sql , sql*loader ...).



    is there any way to improve Spoon performance ?






    Thank you for your help,



    Thierry

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Spoon : how to improve performance ?

    Thierry,

    Sure, use SQL*Loader, why don't you?
    I would be interested in seeing how off-loading the DB2 data to text file and then loading the text-file with SQL*Loader can be any faster then the way you do it now, but hey, knock yourself out.

    In the mean time consider this:
    - Are there many indexes on the target table
    - Are you running on a slow or fast network (from DB2? to Oracle?)
    - What are "Many columns?" 10-20-50-200-500?
    - ...

    I've seen instances where a Kettle transformation would insert 5000 rows per second and I guess that was OK.
    In other conditions you might get 300-400 rows per second and that would still be perfectly OK because of the many bitmap indexes etc.

    Hope this helps,

    Matt

  3. #3
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: Spoon : how to improve performance ?

    Hi Matt,


    - I'm testing kettle in the same production environment I use to load my datawarehouse. (Fast network, dual processor, 4 gb memory ...)



    - The target table has no index when I load it and it has 180 columns.



    - Downloading to a text file and loading via sql*loader (with direct parameter) is currently 3 times faster than Kettle.



    - With Kettle I get around 300 records per seconds. I use only 2 steps: one input and one output.



    - I really would like to improve performances with Kettle, because it has useful concepts and features. I tried a lot of different settings but I currently failed to get an optimized situation.



    I hope to find and share my experience to the community ! need help !



    Thierry

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: Spoon : how to improve performance ?

    Hi Thierry,

    3 times slower then SQL*Loader seems reasonable once you realise that the data first has to be extracted of a TCP/IP connection from the DB2 database.
    Mind you, that this highly depends on the SQL*Loader script. If you would use data type conversions or other features that require the Oracle SQL Engine, that number would be more like 1.5-2 times slower. (accounting for the DB2 extraction over TCP/IP)

    After the data-type conversions (which are very cheap from DB2-Kettle-Oracle) the loading towards the Oracle database takes place.

    That loading process is *probably* as fast as it gets using batch inserts, prepared statements and transactions.
    The only thing that might give you a speedup (0-30%) is by loading several copies of the TableOutput. Click right on the Table Output step and use the "number of copies to launch..." option.
    This will reduce the latency of the network a bit more.

    Also, even though you have a "fast" network (wathever that means) it might be usefull to run the kettle transformation on the Oracle server itself. Certainly if you wan to compare with SQL*Loader.

    >! need help !

    Don't we all?

    All the best,
    Matt

  5. #5
    Join Date
    Apr 2006
    Posts
    2

    Default RE: Spoon : how to improve performance ?

    Hi Thierry,

    i think the insert-Statements would be faster, if the oracle tables would be able for parallel inserts.

    ALTER TABLE table_name PARALLEL (degree DEFAULT); ALTER INDEX index_name (degree DEFAULT);

    you have to change the oracle db parameter: parallel_automatic_tuning = true

    for smal tables the overhead for parallel calucaltion is high.
    May be it will help you.

    bye,

    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.