PDA

View Full Version : Moving medium amounts of data from one db to another



marc_swingler
04-05-2006, 07:09 AM
Attachment: spoon.zip (http://forums.pentaho.org/archived_att/kettle/spoon.zip) Matt,


I have a table, CAR_PKT_GEOM in one db which contains ~450K rows. Eventually I want to transform this data before moving it to a second DB. For now though I just want to try moving it from one place to another using a straight Table Input and Table Output. If I limit the number of rows to 50K being read from the origin DB everything goes great. (takes about 70sec) If I limit the number of rows to 100K it slows a little, but its not bad. (3.5min) If I limit the number of rows to 200K it slows a more. (11min) Limiting the query to 400K or setting the Limit size to 0 causes the process to grind to a halt. (I get an IO Exception after 8hrs). I don't want to partition the table in the target DB, so how do I speed the process up? I'm considering just grabbing 50K rows at a time and looping unitl finished, but I'm not sure it's the best solution. Any help appreciated. (ktr and log file attached)



Marc

MattCasters
04-05-2006, 10:00 AM
Hi Marc,

PostgreSQL suffered from a similar problem in that there was a JDBC particularity before the setFetchSize() on the prepared statement worked. Take the dev version of Kettle for a spin (Documentation / Development packages: kettle.jar) if you ever face that one.

In the case of MySQL, I'm not sure there is anything we can do. Even the latest versions of MySQL and the latest JDBC drivers have this problem: they simply load the complete table in memory, causing all kinds of problems.
If anyone knows why the setFetchSize is not working on MySQL, let me know OK?

Don't have the time right now to go after it any further, will check in a few days again.

Cheers,
Matt

marc_swingler
04-05-2006, 11:36 AM
Thanks for the quick response. I haven't tried it yet, but it looks like the 3.0 MySQL JDBC drivers don't have the same problem. I'll try modifying the MySQLDatabaseMeta class to use the old driver later today to see what happens. Meanwhile, ODBC seems to work for now.

MattCasters
04-05-2006, 11:44 AM
No problem, we need this to work for the test we have scheduled on the 5.1 beta of MySQL. (the partitioning support)
So I'd better find a solution sooner or later ;-)

For Postgres the solution was to disable to auto-commit on the connection. (can you believe that, one a read-only connection?)
Probably a similar puzzle that we need to solve here for MySQL.

Thanks for the feedback!
Matt

kettle_anonymous
04-18-2006, 01:33 AM
MySQL has the same feature. You can turn it off using AUTO_COMMIT=0.

Jonathon

MattCasters
04-18-2006, 01:36 AM
Really? That's amazing. I guess they are stealing bugs from one another ;-)
I guess open-source is not ALWAYS perfect.

OK, I'll try it later this evening. (CET)

Thanks,

Matt