Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Table Output to new dimension table (new row for every column value)

  1. #1

    Question Table Output to new dimension table (new row for every column value)

    Hi,

    We have a unique row for tracking every new column data as a new row in a separate table. For example, the 'customers' table below (configured as a table out step):

    Name:  Customers.png
Views: 49
Size:  3.5 KB

    When the customers table is populated, the 'event' table insertion should be triggered and it should look something like this:

    Name:  staging.jpg
Views: 50
Size:  18.6 KB

    What step(s) can help me achieve this? Please suggest.
    Last edited by Charles Barton; 02-16-2011 at 07:36 AM.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You can pretty much only solve this with dynamic SQL and that means the "exec SQL" step.
    This kind of construct: "UPDATE ? SET ? = ? WHERE ? = ?"

    While slow it would work. I did something similar in PL/SQL once with dynamic SQL and that was horribly slow as well.
    It's a little bit strange you don't get the primary key for the row to update though. I would also expect a mention of the type of operation: update, insert, delete for example.

  3. #3

    Post

    Hi Matt,

    Thanks for the suggestions. Actually there is no update operation. All rows are table inserts.

    The first table 'customers' is populated via a Table output step. The previous step to this is Join rows (Cartesian product).
    The second table 'event' is used for tracking history of insertions in the 'customers' table. So for every new row in 'customers' a new row must be created in 'event' but picking up one column at a time from 'customers'.
    For example: There are three columns to consider in 'customers' - customer_name, customer_type_id and customer_description. So for a new now in 'customers', I need to create 3 new rows in table 'event' each containing the value for that column.

    Hope I explained it better this time

    Thanks,
    Charles Barton

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Ah, you need to create the event table. Sorry, I missed that one :-)
    Well, this should be really easy to do with the "Row Normalizer" step.
    For example, you can put fields "customer_name", "customer_type_id" and "customer_description" in the first column in that step.
    The type then becomes "staging_column_name". Finally you can specify any of these names in the last column: "event_data_varchar", "event_data_int" and "event_data_date".
    The time and the name of the table can be added with a "System Info" and an "Add Constants" step.

    HTH,

    Matt

  5. #5

    Default

    Thanks Matt.. You're the man!
    I had tried an approach using Row Normalizer and your response just confirmed the solution.

    I used a Row Normalizer from the Table output step -> used Generate rows and Get system info to add the other columns -> Connected all three via Join Rows (Cartesian product) -> And finally to my table output for 'Staging_event'.

    Thanks and regards,
    Charles Barton

  6. #6

    Default

    Hi Matt,

    I'm trying to find steps to implement another operation on the same tables. Whenever a row is updated in the 'customers' table, a new row must be inserted in 'event' table for every column value that is modified.

    Let's say the customer_type_id is changed from 2 to 3 in second row of customers table:

    Name:  Customers (updated).png
Views: 33
Size:  3.4 KB

    Now this modification must be inserted as a new row in the event table:

    Name:  Customers (updated).jpg
Views: 33
Size:  20.8 KB

    My transformation is something like this right now:

    Name:  transformation.jpg
Views: 35
Size:  14.1 KB

    Please suggest the right steps to implement the update operation.

  7. #7
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It's pretty much the same story for updates, right?
    I'm happy to see I wasn't that far off the ballpark when I said there should be updates and deletes in the events table :-)

  8. #8

    Default

    There is one problem with the Table 'Update' step here. It sends across all the incoming rows and not just the ones which are updated. Is there a way to change this behaviour?
    The other challenge is to capture the column names and respective values that were updated in a particular row. How can this be achieved?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.