Hitachi Vantara Pentaho Community Forums
Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: Out Of Memory Problem...Result Streaming Helped?

  1. #1

    Question Out Of Memory Problem...Result Streaming Helped?

    Kettle 2.5.0

    I was having the out of memory problems and tried everything I could findhere on the forums and elsewhere on the Net, with no success. I have a very simple transformation that has one Table input and one Dimension lookup/update step. The source table has over 3M rows in it, so it is a decent size table, but not our biggest by any means.

    I went in to the connection properties for the Table input step and turned on Use result streaming (cursor emulation). I'm connecting to a MySQL 5.0.18 database.This seems to have has stopped the out of memory problem from happening. Any ideas why? I'm not entirely sure I understand this option.

  2. #2

    Default

    Forgot to mention that it's the size of the source that seems to be the problem as this transformation works when I limit the source to something like, 100k rows. Connecting natively (JBDC)...

  3. #3
    DEinspanjer Guest

    Default

    I did a quick google search for "mysql result streaming" and found this blog post that seems to describe the situation pretty well...
    http://mtj.wordpress.com/2007/10/21/...sults-in-java/

  4. #4

    Default

    Hmmm, this option is not so good, it seems. I've turned it back off and now have my out of memory problem again. Ideas?

  5. #5

    Default

    Thank you for the link...I can't have that table locked and there will be queries launched against that table while the transformation is taking place. I'll keep fiddling with options I guess. Until I can get those 3M rows processed...

  6. #6
    DEinspanjer Guest

    Default

    Why did you come away with that impression?
    Basically, the article says that if you don't have enough memory on the client to store the entire resultset in memory that this is what result streaming is for. When you enable it, you pay the penalties associated: increased memory "churn", prevention of concurrent updates, and you have to process the entire resultset within the time specified in net_write_timeout or you will get timeout errors that will abort your entire transformation.

    If you can allocate enough memory in Kettle to accomidate the resultset, don't use streaming. If you can't do so, then use it.

  7. #7

    Default

    It's not clear in that article that what it talks about is actually done through setting that option for a connection. Doesn't it talk about setting the option defaultFetchSize and userCursorFetch?

    The combination of these several opions has me confused.

  8. #8

    Default

    I guess what I would really appreciate (thanks for your help so far) is someone to tell me which settings I need to tell the Table input step in Kettle (2.5.0) not to buffer results, but to deliver them one row at a time so I don't get the out of memory error.

    On the Connection information dialog...

    MySQL tab

    Use result streaming (cursor emulation) (yes / no?)

    Options tab

    defaultFetchSize?
    userCursorFetch?

  9. #9

    Default

    You can change the memory buffer size in the misc tab of transformation settings.

  10. #10

    Default

    I don't think changing the memory buffer size will help when it comes to bringing over millions of rows? Could you please elaborate on your comment and how it might help? This seems to be a very common problem and I'm sure others will benefit from this discussion.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.