PDA

View Full Version : MySQL bulk load



MattCasters
03-05-2007, 10:50 AM
Pretty Sick Slick
The last week I was under the weather and a year ago that would have meant that development of Pentaho Data Integration (http://kettle.pentaho.org) (PDI) would pretty much stop. These days I’m happy to say that this is absolutely not true anymore. In fact, hundreds of commits where done in the last week.
MySQL bulk load
To pick one example, Samatar Hassan added a job entry that allows you to configure a MySQL Bulk load job entry:
http://www.kettle.be/images/mysql-bulk-load.png
This job entry loads data as fast as possible into a MySQL database by using the LOAD DATA SQL command (http://dev.mysql.com/doc/refman/5.0/en/load-data.html). It’s not as flexible as the Text File Input step, but it sure is fast. In certain cases, it might actually be up to ten times as fast. In short: another great job by Samatar!
I’m being told that Samatar is also writing a bulk loader for Microsoft SQL Server and that Sven is working on an Oracle SQL*Loader wrapper.
Wait, there’s more…
In addition to that, I saw the following job entries appear in the last couple of weeks: File Compare, SFTP Put, Display Messagebox Info, Wait for, Zip File and last but not least: XSLT Transformation. We also added the Formula (http://forums.pentaho.org/showthread.php?t=51810) and Abort steps. I’ll get back to you on the Formula step later as it’s an interesting option, although far from complete.
Evil Voices
Evil voices among my readership might suggest to get sick a bit more often. However, because of the highly modular nature of PDI, it is perfectly possible to develop code in parallel in a safe way. I can assure you all that it is not that I’m now forced to allow other developers to contribute. Everyone that has a great idea and wants to donate code to the PDI project, is welcome to do so at any time. The latest avelanche of code is just more proof that open source works and that by opening up as a project you gain a lot in the long run.
Today there are around 48 people that have write access to the Subversion code repository, and around 5-15 people commit code in any given month.
Release management
That is all great, but it does make the release management a bit more difficult. I think that we should probably take into account a 2 to 3 week delay in getting all the new stuff translated, documented and tested a bit more. Of-course, you can help out with that as well. Or you can just let us know how you feel about all these new developments.
Another small problem is that by adding all these new features it’s almost ridiculous to do a (2.4.1) point release now.
Until next time,
Matt


More... (http://www.ibridge.be/?p=35)