Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Transform flat data to parent / child (header / detail) streams?

  1. #1
    Join Date
    Jan 2017
    Posts
    2

    Question Transform flat data to parent / child (header / detail) streams?

    I fear this is a rudimentary question, but we have spent quite a bit of time investigating to no avail:

    We have an flat input of (x) columns. In that input stream, several of the columns represent header information, and the rest are detail information. We want to transform this into the target RDBMS parent / child tables, each of which has an internal primary key.

    I *think* we can do this in two steps in Kettle; 1) Use "Unique Rows" to generate a stream of the header data, and write that to the parent table, 2) process all detail rows into the child table with a lookup to go find the PK of the parent table. This feels clunky and not reusable.

    Are we missing something? Is there a simple way to split the original stream into parent / child, and have the PK from the RDBMs injected into the stream? Is there a way to use metadata injection to create and "teach" a GENERIC transformation what columns are header and what columns are detail? Has anyone created a 3rd party transformation to address this?

    I feel like this is a very common activity for ETL/Dx, which makes me worry we have missing something simple...

    All help is appreciated!

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Without seeing the input & expected output, it's hard to discuss in any detail.

    What I would ask is: How would you do the work if you were doing it by hand?
    Answering that question usually walks you through what you need to do in PDI

  3. #3
    Join Date
    Jan 2017
    Posts
    2

    Default

    I think that means the "right" thing to do is 1) Gather the list of input fields that belong in the Parent, make that list unique, and insert them into the Parent RDBMS table. Then make a second pass with the "rest" of the input fields, use a lookup to add the primary key from the RDBMs parent table to the stream, and insert the resultant stream into the RDBMS child table.

    If that's the process; is there a way to define the list of input fields that belong in the parent and the list of input fields fields that belong in the child VIA metadata injection?

    Second, since the table insert can automatically retrieve the PK from the RDBMS parent table, is it simpler to relink the original stream to the output stream of the table insert to retrieve the PK value, or do a lookup?

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Not saying how would you do it programmatically...

    If I gave you the sheet of paper that has your input on it, and asked you to output it onto two sheets of paper, what would you do?

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by clucksted View Post
    ... insert them into the Parent RDBMS table. Then make a second pass with the "rest" of the input fields, use a lookup to add the primary key from the RDBMs parent table to the stream ...
    If possible, use Table-Output option "Return auto-generated key" to return the internal PK of inserted rows.

    Quote Originally Posted by clucksted View Post
    ... is there a way to define the list of input fields that belong in the parent and the list of input fields fields that belong in the child VIA metadata injection?
    At first you should design the transformation without ETL-Metadata-Injection. Typically, if you find yourself maintaining multiple transformations having identical design but differing fieldnames (for example) you will think about injection.

    Quote Originally Posted by clucksted View Post
    ... is it simpler to relink the original stream to the output stream of the table insert to retrieve the PK value, or do a lookup?
    With the master rows containing the PK it would be best to do a Merge-Join to have the foreign key ready for detail row insertion. While Merge-Join requires the streams to be sorted on the (possibly compound) join key, Stream-Lookup is a tad more demanding on memory and/or processing time. Only you know the relevant numbers, so you decide.
    So long, and thanks for all the fish.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.