-
Out Of Memory Problem...Result Streaming Helped?
Kettle 2.5.0
I was having the out of memory problems and tried everything I could findhere on the forums and elsewhere on the Net, with no success. I have a very simple transformation that has one Table input and one Dimension lookup/update step. The source table has over 3M rows in it, so it is a decent size table, but not our biggest by any means.
I went in to the connection properties for the Table input step and turned on Use result streaming (cursor emulation). I'm connecting to a MySQL 5.0.18 database.This seems to have has stopped the out of memory problem from happening. Any ideas why? I'm not entirely sure I understand this option.
-
Forgot to mention that it's the size of the source that seems to be the problem as this transformation works when I limit the source to something like, 100k rows. Connecting natively (JBDC)...
-
I did a quick google search for "mysql result streaming" and found this blog post that seems to describe the situation pretty well...
http://mtj.wordpress.com/2007/10/21/...sults-in-java/
-
Hmmm, this option is not so good, it seems. I've turned it back off and now have my out of memory problem again. Ideas?
-
Thank you for the link...I can't have that table locked and there will be queries launched against that table while the transformation is taking place. I'll keep fiddling with options I guess. Until I can get those 3M rows processed...
-
Why did you come away with that impression?
Basically, the article says that if you don't have enough memory on the client to store the entire resultset in memory that this is what result streaming is for. When you enable it, you pay the penalties associated: increased memory "churn", prevention of concurrent updates, and you have to process the entire resultset within the time specified in net_write_timeout or you will get timeout errors that will abort your entire transformation.
If you can allocate enough memory in Kettle to accomidate the resultset, don't use streaming. If you can't do so, then use it.
-
It's not clear in that article that what it talks about is actually done through setting that option for a connection. Doesn't it talk about setting the option defaultFetchSize and userCursorFetch?
The combination of these several opions has me confused.
-
I guess what I would really appreciate (thanks for your help so far) is someone to tell me which settings I need to tell the Table input step in Kettle (2.5.0) not to buffer results, but to deliver them one row at a time so I don't get the out of memory error.
On the Connection information dialog...
MySQL tab
Use result streaming (cursor emulation) (yes / no?)
Options tab
defaultFetchSize?
userCursorFetch?
-
You can change the memory buffer size in the misc tab of transformation settings.
-
I don't think changing the memory buffer size will help when it comes to bringing over millions of rows? Could you please elaborate on your comment and how it might help? This seems to be a very common problem and I'm sure others will benefit from this discussion.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules