Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: diferences between MERGE JOIN, STREAM LOOKUP and DATABASE LOOKUP

  1. #1

    Default diferences between MERGE JOIN, STREAM LOOKUP and DATABASE LOOKUP

    Hello,

    I have been tried to understand when it is most suitable to use each of the following steps: MERGE JOIN, STREAM LOOKUP and DATABASE LOOKUP. Do not know if anyone has already addressed this to the forum. I would appreciate if someone could help me on this.

    thanks.

  2. #2
    Join Date
    Feb 2016
    Posts
    11

    Default

    I have used Merge Join and Stream Lookup (guess DB Lookup is same as Stream Lookup). Consider following two tables

    LeftCol1 RightCol1 RightCol2
    A A 1
    B A 2
    C D 4

    Join Left x Right on LeftCol1=RightCol1: Result (all records in Left table, duplicated if there are more than one matching in Right, notice row 1 and 2 below)
    LeftCol1 RightCol1 RightCol2
    A A 1
    A A 2
    B null null
    C null null

    In case of lookup, values from Left lookedup for first match only (no duplications of rows) so you'll get first, third and fourth row from above table.

  3. #3
    Join Date
    May 2013
    Posts
    16

    Default

    Could you please give more information about your case?

  4. #4
    Join Date
    May 2014
    Posts
    358

    Default

    Quote Originally Posted by perito37 View Post
    when it is most suitable to use each of the following steps: MERGE JOIN, STREAM LOOKUP and DATABASE LOOKUP.
    Merge join = when you need to join two large data sets which are ordered by the same keys. When a key occurs several times in one data set, it multiplies the number of output rows the same way as an SQL join.
    Stream lookup = you have a main stream and a secondary small stream (lookup stream) from which you want to look up exactly one row for the main stream, and the streams cannot be ordered the same way. Beware that all data from the lookup stream have to be kept in memory.
    Database lookup = gives you exactly one row from a database table based on the specified criteria. If more rows are found that satisfy the criteria, you get only one. By default, it makes 1 query on the database for each row, which is quite slow, so I suggest to use it to look up data in small tables which you can fit into the memory whole (Enable caching, load all data).
    Database join = one query per row, allows you to specify a custom SQL, and can return more than one row from the DB (therefore it can multiply the amount of rows on output).

  5. #5

    Default

    Quote Originally Posted by Lukfi View Post
    Merge join = when you need to join two large data sets which are ordered by the same keys. When a key occurs several times in one data set, it multiplies the number of output rows the same way as an SQL join.
    Stream lookup = you have a main stream and a secondary small stream (lookup stream) from which you want to look up exactly one row for the main stream, and the streams cannot be ordered the same way. Beware that all data from the lookup stream have to be kept in memory.
    Database lookup = gives you exactly one row from a database table based on the specified criteria. If more rows are found that satisfy the criteria, you get only one. By default, it makes 1 query on the database for each row, which is quite slow, so I suggest to use it to look up data in small tables which you can fit into the memory whole (Enable caching, load all data).
    Database join = one query per row, allows you to specify a custom SQL, and can return more than one row from the DB (therefore it can multiply the amount of rows on output).
    thanks for the answer Tukfi. So If I want to do left join from a bid table on to a small table, p`robably the best step to use is the Strem Lookup
    On the other side, when I want to cross big tables, Merge join is the best step
    I am not sure when Database lookup is a good choice. any answer on this?

  6. #6
    Join Date
    May 2014
    Posts
    358

    Default

    I'd say Database lookup is good for doing a lookup into a small table (which you can fit into memory).
    Or, if you need to look up something from a huge table and your data set is small, or it only contains a small number of keys that repeat many times, then a Database lookup (with caching, but not loading full table) will be faster than reading the whole huge table.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.