I have made the following kettle transformation. It consists out of 2 steps a "table input" and "dimension lookup/update" 9 out of 10 run's give an error message "java.io.EOFException" I attached the kettle log and the transformation.
That was a JDBC driver problem (we've seen it a couple of times), it depends which version of MySQL you're running. Maybe it's possible for you to put the version of the JDBC driver you get with your version of MySQL in the libext directory of kettle (and then in your version probably change the .bat/.sh file you use to start spoon).
You can also give Kettle v2.4.1 a spin, only for testing.
I have tested with version 3.1.13, 5.0.5. Both have the same problem.
The problem does not occur if i run the sql's from the command line mysql client. How can i show mysql support (i am paying customer) that their driver has a problem?
That always a sticky point... sent them your example kettle transformations ;-)
Also try version 3.1.14 of the MySql JDBC driver, from past mails it seems to function more ok than others. And be sure to really remove the old JDBC driver.
sven, this is a comment of a mysql engineer. Maybe this is usefull for you. Could you check the code and give your comments.
2007/03/13 16:12:49 - reporting - ERROR (version 2.3.1, build 63 from 2006/09/14 12:04:05 @ sam) : Streaming result set com.mysql.jdbc.RowDataDynamic@675039 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
Either you or Pentaho have enabled streaming result sets (a fast-forward, read-only result set that uses the minimum fetch size), and have issued a new statement on the connection without retrieving all of the previous esult set.
Basically, this is bad code or bad configuration.
We have a JDBC URL property you can use, clobberStreamingResults, that simply tosses out the old result set when a new statement is issued on the connection, but that is only useful if you really were done with the previous ResultSet.
You probably were not done with the results, so this is probably not what you want.
Streaming result sets are required for dealing with large results. Without them, the entire result set has to be loaded into RAM, which can require an enormous amount of memory.
Whoever has written the code involved will have to use these properly, or if it is acceptable to toss out the old result set, you may try the URL configuration property.
Regards,
--
Dean Ellis, Senior Support Manager, Americas
MySQL Inc., www.mysql.com
Are you MySQL Certified? http://www.mysql.com/certification
If your engineer is right, I would expect it to always fail. The table input step is not that big... its function is just to read a resultset and put it row objects. No complicated multiple SQL statements... open read close.
I do know some people had problems with streaming results and that in the end they were solved by one specific JDBC driver version. But not the very latest MySql JDBC driver since that was a terrible version.
So first try JDBC 3.1.14 with Kettle 2.3.1. Then try Kettle 2.4.1-M1 with the latest kettle.jar. And let us know your results.
JDBC 3.1.14 gives a better result, but the error is still happening. (frequency of the error is lower)
There is no difference between 2.3.1 and the 2.4 version.
Wat i saw is that the read counter on line 2 (Dimension lookup/update) is not increasing for a while. When the error on line 1 is shown, it updates the read count field. the increase can be over 1500 records. What i think is happening is that, step 2 takes so much time that the "net_read_timeout" and/or the "net_write_timeout" values (30s / 60s) of the mySQL connections are exceeded and the connection is aborted.