PDA

View Full Version : 4.3 million rows per second



MattCasters
10-12-2007, 04:40 PM
Earlier today I was building a test-case in which I wanted to put a lot of Unicode data into a database table. The problem is of-course that I don’t have a lot of data, just a small Excel input file.
So I made a Cartesian product with a couple of empty row generators:
http://www.kettle.be/images/4M-rps-trans.png
It was interesting to see how fast the second join step was generating rows:
http://www.kettle.be/images/4M-rps-log.png
Yes, you are reading that correctly: 717 million rows processed in 165 seconds = 4.3 million rows per second.
For those of you that would love to try this on their own machine. Here is an exclusive present (http://s3.amazonaws.com/kettle3/Kettle-3.0.0-RC2-20071012.zip) for the readers of this blog in the form of a 3.0.0-RC2 preview of 2007/10/12 (88MB zip file). We’ve been fixing bugs like crazy so it’s pretty stable for us, but it’s still a few weeks until we release RC2. Don’t do anything crazy with this drop! This is purely a present for the impatient ones. If you find a bug, please file it (http://jira.pentaho.org/browse/PDI)! (give us a present back :-))
Until next time,
Matt


More... (http://www.ibridge.be/?p=76)