Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Stream lookup issues

  1. #1
    Join Date
    Sep 2014
    Posts
    122

    Default Stream lookup issues

    Hi,
    I want to improve the stream look performance, could you give me some tips?
    1. what the data cache mechanism of the stream look step?
    2. Scenario:
    a.File a have 3 columns a1,a2,a3
    b.File b have 10 columns b1,b2,b3..b10
    c.a stream lookup b to get b.b9 on a.a3=b.b3
    if i add a Select value step betwwen b and stream lookup just select the b.b3,b.b9, Is this way can improve the stream lookup performance?

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    I have replied with a PM (private message)
    -- Mick --

  3. #3
    Join Date
    Sep 2014
    Posts
    122

    Default

    Do you know which situation the "Preserve memory ","Key and value are exactly one integer field " and "Use sorted list " checkbox will be used in the stream lookup to improve performance?
    Can anyone give me some answer?

  4. #4
    Join Date
    Sep 2014
    Posts
    122

    Default

    Anyone can help me ?

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    The Wiki page on Stream Lookup does explain the effects of the tuning options very well, I think.
    So long, and thanks for all the fish.

  6. #6
    Join Date
    Sep 2014
    Posts
    122

    Default

    Hi Marabu,


    i am a new guys to the kettle. i have looked the WIKI page of the Stream lookup,but it just description the result of the "Key and value are exactly one integer field " and "Use sorted list " do not indicate which case to use.
    could you give me sone advices about that?

  7. #7
    Join Date
    Sep 2014
    Posts
    122

    Default

    I get the following information from another thread, but I also do not know which situation to use the "Preserve Memory" ,"Sorted List" and " Integer Pair " can improve the streamlookup performance, can someone give me some example?

    http://forums.pentaho.com/showthread...-field-quot-do
    If none of these optimizations are enabled, Kettle will use a HashMap to store the lookup data. It has to wrap the lookup data inside a new object that can be compared as well. I imagine this is the extra memory cost.

    If the Preserve Memory option is enabled but the other two aren't, Kettle will store the lookup data as raw bytes in a custom storage object that uses a hashcode of the bytes as the key. More CPU cost related to calculating the hashcode, less memory needed.

    If the Sorted List option is enabled too, the lookup data is put into a tuple and stored in a sorted list. Lookups are done via a binary tree search.

    If the Integer Pair option is enabled and the sorted list option is not enabled, the lookup data is stored in a custom storage object that is similar to the byte array hashmap, but it doesn't have to convert to raw bytes. It just takes a hashcode of the long and does some funky stuff with it. Honestly, I don't see any reason that it requires the value to be a long too. That is either a blindspot on the part of the developer or of me.
    Last edited by chen5132649; 12-30-2014 at 11:13 PM.

  8. #8
    Join Date
    Sep 2014
    Posts
    122

    Default

    Anyone give me some helps?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.