Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Performance btween Stream lookup and Merge join..

  1. #1

    Default Performance btween Stream lookup and Merge join..

    Hello,

    I have a huge file around 10 Mil data in a CSV file .. Which has to be joined against the main stream data to get some fields.

    I have two options as

    1) Going for stream lookup - Making the main stream data as lookup data
    2) A merge Join


    Which one would be better at the time of processing considering Memory and performance?

    - presently I am making use of Merge Join , But would like to know the Performance considerations in using the Stream lookup

    Please advice

  2. #2
    Join Date
    Aug 2008
    Posts
    18

    Default

    Hi

    I would think that if you have the 3 options checked in the stream lookup at the bottom ie. Preserve memory cost, key and value are exactly one int , use sorted list then this step may be faster than merge join but you have to satisfied the conditions of using this.

    see http://wiki.pentaho.com/display/COM/...tion+in+Kettle
    http://wiki.pentaho.com/display/EAI/Stream+Lookup

    Merge join expects the input stream to have been sorted on the join key.

    regards

    Naresh

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.