PDA

View Full Version : Spoon : how to improve performance ?



kettle_anonymous
04-06-2006, 11:03 PM
Hi,


I've to download (from db2-400) & upload (to Oracle 9i) large data files (>1 million record) with many columns.



I found Spoon very slow to process these files and I consider to maintain my old fashion tools for this kind of process (pl/sql , sql*loader ...).



is there any way to improve Spoon performance ?






Thank you for your help,



Thierry

MattCasters
04-08-2006, 06:52 AM
Thierry,

Sure, use SQL*Loader, why don't you?
I would be interested in seeing how off-loading the DB2 data to text file and then loading the text-file with SQL*Loader can be any faster then the way you do it now, but hey, knock yourself out.

In the mean time consider this:
- Are there many indexes on the target table
- Are you running on a slow or fast network (from DB2? to Oracle?)
- What are "Many columns?" 10-20-50-200-500?
- ...

I've seen instances where a Kettle transformation would insert 5000 rows per second and I guess that was OK.
In other conditions you might get 300-400 rows per second and that would still be perfectly OK because of the many bitmap indexes etc.

Hope this helps,

Matt

kettle_anonymous
04-08-2006, 07:29 AM
Hi Matt,


- I'm testing kettle in the same production environment I use to load my datawarehouse. (Fast network, dual processor, 4 gb memory ...)



- The target table has no index when I load it and it has 180 columns.



- Downloading to a text file and loading via sql*loader (with direct parameter) is currently 3 times faster than Kettle.



- With Kettle I get around 300 records per seconds. I use only 2 steps: one input and one output.



- I really would like to improve performances with Kettle, because it has useful concepts and features. I tried a lot of different settings but I currently failed to get an optimized situation.



I hope to find and share my experience to the community ! need help !



Thierry

MattCasters
04-08-2006, 11:39 AM
Hi Thierry,

3 times slower then SQL*Loader seems reasonable once you realise that the data first has to be extracted of a TCP/IP connection from the DB2 database.
Mind you, that this highly depends on the SQL*Loader script. If you would use data type conversions or other features that require the Oracle SQL Engine, that number would be more like 1.5-2 times slower. (accounting for the DB2 extraction over TCP/IP)

After the data-type conversions (which are very cheap from DB2-Kettle-Oracle) the loading towards the Oracle database takes place.

That loading process is *probably* as fast as it gets using batch inserts, prepared statements and transactions.
The only thing that might give you a speedup (0-30%) is by loading several copies of the TableOutput. Click right on the Table Output step and use the "number of copies to launch..." option.
This will reduce the latency of the network a bit more.

Also, even though you have a "fast" network (wathever that means) it might be usefull to run the kettle transformation on the Oracle server itself. Certainly if you wan to compare with SQL*Loader.

>! need help !

Don't we all?

All the best,
Matt

skraemer
04-18-2006, 06:34 AM
Hi Thierry,

i think the insert-Statements would be faster, if the oracle tables would be able for parallel inserts.

ALTER TABLE table_name PARALLEL (degree DEFAULT); ALTER INDEX index_name (degree DEFAULT);

you have to change the oracle db parameter: parallel_automatic_tuning = true

for smal tables the overhead for parallel calucaltion is high.
May be it will help you.

bye,

Sven