Hitachi Vantara Pentaho Community Forums
Results 1 to 13 of 13

Thread: Dynamic Denomormalizer

  1. #1
    Join Date
    Apr 2008
    Posts
    4,696

    Default Dynamic Denomormalizer

    I didn't think I would end up being one of these people, but...

    I have an XML file that is normalized down to the level:
    <row>
    . <id>1</id>
    . <key>ColName</key>
    . <value>MyColumn</value>
    </row>

    I want to convert this over to Excel tabs for Documentation purposes.
    I can get the output stream to the level of:
    id | key | value
    1 | ColName | MyColumn

    But.. Not every table described in the XML has the same number of keys.
    I can break the XML down using the Get Data from XML so that it is only describing a single table, but at this point, I just want to denormalize a dynamic number of keys to columns, which then would go into an Excel output.

    I can't figure out an easy way to do this.

    Can someone give me a pointer?
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I agree that the denormalizer step might be a good candidate for the experimental "Metadata Injection" step (>=4.1.0).
    Do me a favor and create a JIRA case for both Normalizer and De-normalizer. I'll try to get it into 4.2.

    All other workaround would have to use some kind of Kettle transformation generation (perhaps you can hack something with a Kettle transformation in XML and some clever XSLT but that's really not my cup of tea). Another solution obviously is to dynamically generate the transformation metadata using Java. Both solutions are sub-optimal IMO.

    Cheers,

    Matt

  3. #3
    Join Date
    Nov 2008
    Posts
    143

    Default

    I believe your problem is similar to joao's.

    I'd try to read all data, gather key columns, then do some sort of "left outer join" with xml data x key columns.

    Does it help?

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Jira Added:
    http://jira.pentaho.com/browse/PDI-5383

    In the end, I cheated.

    Group By ID, Concatenate key values with , and Concatenate value values with ,
    Split the flow for keys and values, Limiting to one key per set or rows (forgot that there was one layer more!)
    Rename both keys and values to cell
    Append Streams
    Split Value cell on , for all rows. With Split Values, you must specifiy the number of columns to make, so I specified 30 numbered rows.

    In the end, output to Excel, with Headers off, and it looks like I pulled off the impossible.

  5. #5
    Join Date
    Feb 2010
    Posts
    19

    Default

    gutlez,

    Thanks! I had to read it a few times, but that solution is awesome. Appear to do the impossible--I love it.

  6. #6

    Default

    Quote Originally Posted by gutlez View Post
    Jira Added:
    http://jira.pentaho.com/browse/PDI-5383

    In the end, I cheated.

    Group By ID, Concatenate key values with , and Concatenate value values with ,
    Split the flow for keys and values, Limiting to one key per set or rows (forgot that there was one layer more!)
    Rename both keys and values to cell
    Append Streams
    Split Value cell on , for all rows. With Split Values, you must specifiy the number of columns to make, so I specified 30 numbered rows.

    In the end, output to Excel, with Headers off, and it looks like I pulled off the impossible.
    Can you post the .krt file please?

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Unfortunately, I can't post the KTR, since it has a bunch of confidential info in it.

    Basically I started with:

    Table | Row | Column | Value
    1 | 1 | ID | 1
    1 | 1 | Name | Bob
    1 | 1 | Position | Builder
    1 | 2 | ID | 2
    1 | 2 | Name | Dora
    1 | 2 | Position | Expolorer
    2 | 1 | ID | 1
    2 | 1 | Station | TeleToon

    I need to convert that to:
    ID | Name | Position
    1 | Bob | Builder
    2 | Dora | Explorer
    ID| Station
    1 | TeleToon

    So...
    Sort on Table and Row
    Group on Table and Row - Set two Aggregates: One is Subject Column, the other is Subject Value. Both are configured to "concatenate values with ,"
    Split the output from the Group By step

    Path a)
    Select Values (Only allow Value Column)

    Path b)
    Sequence By Group step (Reset when Table Changes)
    Filter Rows (Only allow the first of each table)
    Select Values (Only allow Header Column)

    Combine Path A and Path B
    Split Fields on "," (I used 30 columns)
    Sort Rows on Table Name

    You will need to clean this up a bit and adapt it for your use.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  8. #8

    Default

    Quote Originally Posted by gutlez View Post
    Unfortunately, I can't post the KTR, since it has a bunch of confidential info in it.
    Can you at least paste a print screen of the transformation ? Would appreciate it

  9. #9
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    If you follow the steps listed in the prior post, you'll have what I have.

  10. #10
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Implemented the metadata injection for the "normaliser" and "de-normaliser" steps as well as "Excel Input".
    Look here for an example: http://jira.pentaho.com/browse/PDI-5422

  11. #11
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    The idea escapes me still... I've downloaded the test case, and haven't quite figured out exactly how it's supposed to work.

    I tried with 4.1.3-r-14837, and get a NPE on the Excel Input on the Template... Is there anything special I need to do?

    However, this must be said:

    THANK YOU MATT!
    You do a lot of work on these tools, and I rarely see you get the appreciation that you deserve - a lot of the "miracles" that I pull off around my office are a credit to the tools that you started and continue to develop. Thank you for your hard work!
    Last edited by gutlez; 02-25-2011 at 01:25 PM.

  12. #12
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Mmm, I did fix an NPE in "Excel Input" in r14831. File a new bug report for that one.

    The point is that for situations where the layout of an XLS is not known in advance there are now solutions.
    It's not just this example. Imagine that you have your own UI, for example a web interface or a SaaS kind of setup. In that case you can also use this templating exercise to solve the puzzle. You dump the layout of the file in whatever format and pass it along to PDI.

    Matt

    P.S. You're welcome

  13. #13
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    I'm not sure...
    The NPE could be coming from the Injection - The Metadata Injector is suggesting that it has no steps to inject into.

    I tried to find an artifact for 14831, since the JIRA mentions it, but I couldn't find one on ci.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.