Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Performance btween Stream lookup and Merge join..

  1. #1

    Default Performance btween Stream lookup and Merge join..


    I have a huge file around 10 Mil data in a CSV file .. Which has to be joined against the main stream data to get some fields.

    I have two options as

    1) Going for stream lookup - Making the main stream data as lookup data
    2) A merge Join

    Which one would be better at the time of processing considering Memory and performance?

    - presently I am making use of Merge Join , But would like to know the Performance considerations in using the Stream lookup

    Please advice

  2. #2
    Join Date
    Aug 2008



    I would think that if you have the 3 options checked in the stream lookup at the bottom ie. Preserve memory cost, key and value are exactly one int , use sorted list then this step may be faster than merge join but you have to satisfied the conditions of using this.


    Merge join expects the input stream to have been sorted on the join key.



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.