Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: File join, stream lookup

  1. #1
    Join Date
    Sep 2005
    Posts
    1,403

    Default File join, stream lookup

    Hi.
    What is the best way to do a 1 to 0,n join from 2 files ?
    For example:
    File1: FamilyID, FirstName
    FamilyID is unique and is the 'master' file. I must find in the result file all the FirstName


    File2: FamilyID, LastName, BirthDate
    Here, I have no key, I can have multiple rows in the file2 with the same FamilyId
    Not all FamilyID of the File1 are present, nethertheless.



    I need a result file like :File3 : FirstName, LastName, BirthDate
    LastName, BirthDate can be 'null' is they don't existe in File2



    When I use the stream lookup because of its internal structure (hastable), the result file is wrong because I don't have unique Key in File 2.



    Is there a solution for my problem ?



    Thanks in advance.
    ppa

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File join, stream lookup

    As much as I would like to minimize I/O, you probably need to store the files in temporary tables and do a database lookup or database join with that. Yesterday, I added a tracker for a feature request that will allow you to do complex comparissons in StreamLookup. (If the comparator is "equal to", it will use a hashtable, otherwise a sorted list.)

    Euh, hold on, wait a minute, you can use a "Join Rows" step and set the condition there.
    That should give you the result you like. It will give you a cartesian product, but you could use a "Unique Rows" step before joining.
    Make sure to change the names of the input fields of "Join Rows" in such a way that they are unique between the 2 streams. Otherwise it's difficult to use the values in the condition.

    Hope this helps,

    Matt

  3. #3
    Join Date
    Sep 2005
    Posts
    1,403

    Default RE: File join, stream lookup

    I think i may be having the same kind of problem :/

    I am collecting data from a relational database in MySQL

    I have the 2 tables that relate to each other

    ----------------------------------------------

    Table 1: (products)

    product_id
    product_name
    product_description

    ----------------------------------------------

    Table 2: (products_info)

    product_id
    product_price
    product_quantity

    ----------------------------------------------

    In my web application these 2 table are joined using the following simple query:

    SELECT *
    FROM products AS p, products_info AS pi
    WHERE p.product_id = pi.product_id

    This will output this result:

    product_id
    product_name
    product_description
    product_id <------- (duplicate)
    product_price
    product_quantity

    What i need to do with Kettle/Spoon is the same idea. I need to join the fields from these 2 tables and output the result to a txt/csv file. I have been trying to do this using various steps yet cant quite get the result im looking for

    Any help would be really appreciated.

    Kind regards, Harris

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default RE: File join, stream lookup

    Well, first of all, you can put the exact SQL from above into TableInput and it will work.
    If you don&#39;t specify the fields in text-file output it will work.
    As long as you know that referencing product_id (2 occurences in the row) will mean manipulating ONLY the first occurence!
    Please know that having duplicate field names in rows is bad practice in Kettle as much as it is bad practice in SQL... (and some of databases even don&#39;t allow this)

    If you want to source from products and add product info, use a simple "Database Lookup" step with product_id as the key.
    You can also use a "Database Join" step where you can specify any complex SQL you would like as a joiner.

    Hope this helps,

    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.