Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Stream lookup causes OutOfMemory

  1. #1
    Join Date
    Oct 2011
    Posts
    2

    Default Stream lookup causes OutOfMemory

    I created a Transformation which outputs a result with reference to the Master CSV file.

    First I created with the "Stream lookup".
    But the big Master CSV causes OutOfMemory.

    Second I created with "Sort rows" and "Marge Join".
    I think this is good for the Low memory PC.
    Is this the best way on Low memory PC?

    Or is there options like "Store to disk" in "Stream lookup" ?
    Here is my Transformations screen shots.
    Sorry my English.
    Attached Images Attached Images  

  2. #2
    Join Date
    Sep 2011
    Posts
    190

    Default

    What are your memory settings for Spoon? Are they still default?
    If so, you may try higher settings for the Java Heap Space.
    Look for -Xmx512m in spoon.bat etc.

  3. #3
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    If you use Stream Lookup you always need a sort (just in case) before each Lookup step.
    If I remember correctly, in each of those steps there is an option that allows you to change the number of rows committed to memory.
    You can use a lower number if you have small memory, obviously that will slow down your stream.

    Mick

  4. #4
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Eh? Stream lookup doesnt require a sort before hand. You're thinking of the merge steps.
    Obviously internally stream lookup will have to sort and index the data.

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Ehm...
    Obviously internally stream lookup will have to sort and index the data.
    it's not so obvious ;-)

    Even the GROUP BY should sort and index the data internally.. but there's always a warning when I use it.
    That's why I suggested for the Stream Lookup to do a sort beforehand.
    I always do it anyway.. never trust a machine..
    "To err is human, but to really foul things up requires a computer".

    Mick

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Stream lookup doesn't need a sort nor need to perform a sort internally. It uses a hash table mostly.
    Like the Memory Group By, doesn't need a sort either. :-)

  7. #7
    Join Date
    Oct 2011
    Posts
    2

    Default

    My PC has 2048M memory, XMX is full(limit) at 1024M.
    Processing STREAM LOOKUP with 1024M XMX but many times fail with OutOfMemory.

    After all, Processing SORT ROWS and MERGE JOIN are better to get correct result although take long time.
    Thank you all

  8. #8
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    yeah i noticed that memory group by isnt == sort by then group by!
    But i've never used sort bys before stream lookups - total waste of time and resources.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.