Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: Loading related tables which have constraints?

  1. #1
    Join Date
    Jan 2011
    Posts
    21

    Default Loading related tables which have constraints?

    Newbie Here so be nice

    I have a single spreadsheet source which I need to load into two tables one of which is required to exist for the second to insert the foreign reference. sponsor and rolodex where rolodex_id is the key in rolodex and the foreign key in sponsor.

    I created a transform with a excel input which splits into two streams by selecting the necessary columns for each table including rolodex_id which is in both streams.

    I need to assure that rolodex inserts first so sponsor insert will see the value and not fail due to the constraint that the rolodex value must exist.

    I think I need to add a "Block this Step until steps finish" but not sure if that's correct. Can anyone help me determine the correct stream for this task.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You have two main options...

    1) Use a Job
    Job steps are processed (in the usual configuration) in sequence, rather than in parallel. Transformation steps are processed in parallel.
    You would write two transforms: Populate rolodex and Populate Sponsor. This will ensure that rolodex is loaded first.

    2) Set Transformation Transactional (terminology varies depends on version) in Transformation Settings.
    This will use a single connection to the database for the Transform, which means that rolodex entries and sponsor entries will be added at the same time, which the DB should allow, even with FK constraints.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  3. #3
    Join Date
    Jan 2011
    Posts
    21

    Default

    I think the transactional idea could work. I will give that a try. I forgot to mention that the rolodex_id is created by a sequence in the transform (not an oracle sequence) so I load the spreadsheet then stream to the sequence then split the stream to two branches one that loads rolodex and one that loads sponsor. So I need the code in one transform so I have the rolodex_id that was used. i guess I could just add the sequence to my spreadsheet and that would solve to problem to allow for two separate transforms or to use job. I don't really understand how a job is different from a transform yet so I need to look into that and understand it more.

    Thanks for the ideas. It definitely points me in the right direction.

  4. #4
    Join Date
    Feb 2009
    Posts
    296

    Default

    Jobs run transformations one by one. So you could have a transformation that reads your spreadsheet and adds the stuff to your dimension. By sending the results to the next job you can THEN (in the sense of "after the first is done") load your data into the fact table.

    I would, moreover suggest you go with the dimension lookup/update job which handles that kind of situation very efficiently so you do not have to split your stream.
    Fabian,
    doing ETL with his hands bound on his back

  5. #5
    Join Date
    Jan 2011
    Posts
    21

    Default

    I tried this option
    2) Set Transformation Transactional (terminology varies depends on version) in Transformation Settings.
    This will use a single connection to the database for the Transform, which means that Rolodex entries and sponsor entries will be added at the same time, which the DB should allow, even with FK constraints.

    But setting the Transformation to "Make the transformation database transactional" in the Transformation Properties didn't work. The database constraint still fails.
    The only way I can figure out to fix this is to put my sequence data into a new spreadsheet output and then use the two transformation technique to put the data into the tables.

    I don't see how "Make the transformation database transactional" can have only one database session in a transaction given that the configuration for database connection is in two separate Input/Output steps. Wouldn't each have a separate transaction / commit ? So the constraint still fails.

    I realize the solution already provided in these threads but I wanted to see if a true database transaction across steps is possible?

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    When the transformation is transactional, a single connection and a single transaction is used across all steps. However, you obviously still have to make sure that you do indeed insert valid data.

  7. #7
    Join Date
    Mar 2010
    Posts
    159

    Default

    Even inside a single transaction you will get FK constraint failures if the order of the inserts is not correct. Can you be certain that the parent record is inserted prior to any child records needing the parent in your transform, or is it possible that a child could make it through first?

  8. #8
    Join Date
    Nov 2008
    Posts
    143

    Default

    I can think of ways using Blocking Step and Table output with error handling, but it depends on what exactly your transformation does.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.