Hitachi Vantara Pentaho Community Forums
Results 1 to 16 of 16

Thread: Confused about steps order!!!

  1. #1
    Join Date
    Jul 2010
    Posts
    25

    Default Confused about steps order!!!

    Hello,

    I have a question about transformation steps order of execution. I read in this forum that there is no step order in a transformation and that all steps/threads are started at the same time.

    if I have a transformation with 3 steps:
    Step1: reads rows from database 1
    Step2: using some columns from the row, the step looks up data in database 2
    Step3: Insert/Update the newly injected data into database 3.

    My question is it possible that step3 happens before step2 and therefore I might insert NULL into my database? How can I prevent this?

    Thanks,
    Rasheedik

  2. #2
    Join Date
    Mar 2010
    Posts
    159

    Default

    Any row will pass through the steps in a predictable sequence via the hops. There's just no guarantee the order in which rows will be processed within a single step.

    Regards,
    Jeremy

  3. #3
    Join Date
    Jul 2010
    Posts
    25

    Default

    Thank you Jeremy.

    But I am not able to get the below scenario working well. Let me try to explain what I am trying to do which I thought it was going to be really simple.

    I am trying PDI for the first time and thought of doing some migration work. So I read from a source database de-normalized membership data. For every membership data record, I want to create a member (i.e. PK member_id) record, a membership (i.e. PK membership_id) record, a transaction (i.e. PK transaction_id) record and a log (with FK member_id, membership_id and transaction_id) record in the target database.

    The inserts work...but some log records have missing foreign keys (i.e. member_id, membership_id and transaction_id) which seem to indicate that the log record got inserted while the dependent record was yet to be generated.

    I tried it one transformation with 4 steps and one job with 4 transformations (where I do the copy to results). The outcome is more or less the same...I constantly have missing FKs.

    Please help. Thanks.

    Regards,
    Rasheedik

  4. #4

    Default

    Hi,

    do you turned your transformation transactional (one connection per database) :

    http://wiki.pentaho.com/display/EAI/...nsformationTab

    in "Miscellaneous" tab.

    Samatar
    Samatar

  5. #5
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Smatar,

    Thank you very much for your answer.

    I forgot to mention in my previous post that I actually tried setting the 'transactional database' on all my transformations. I noticed that nothing is actually written to the database after the main job is done yet my tables are locked and seem to be waiting for a commit!! How do I do auto-commit?

    I also noticed that there is a setting in the 'insert /update' step for the commit size...is this related?

    I really appreciate any pointer.

    Regards,
    Khaled

  6. #6

    Default

    If the commit size is greater that zero, auto commit will be inactive and a commit will occur after each commit size rows processed.
    As such you should set a commit size if you make transformation transactional that way PDI can rollback if any erros happen.

    http://wiki.pentaho.com/display/EAI/Insert+-+Update

    Anyway, can you please attach a sample of a picture of your first scenario (one transformation used).

    Samatar
    Samatar

  7. #7
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Samatar,

    Thank you for your help.

    Attached please find my simple transformation that fails. I use Oracle as the DB and I am sorry I could not make the transformation independent of my DB connection. I hope it will give an idea though.

    I also attached an image of how the data looks like in the database log table when the transformation completes. You may notice the missing FKs!! Please note that the inserts worked well.

    The problem is probably the extra look up step that I am doing after the insert/update ....but I could not find a better way.

    Your help will be extremely appreciated.

    Thanks,
    Rasheedik
    Attached Images Attached Images  
    Attached Files Attached Files

  8. #8
    Join Date
    Jul 2010
    Posts
    25

    Default

    I am also attaching an image of how the transformation looks like just in case you could not open the .KTR file.

    Regards,
    Rasheedik
    Attached Images Attached Images  

  9. #9
    Join Date
    Jul 2010
    Posts
    25

    Default

    Sorry to post again. But I wanted to let you know that if I set the transformation to 'transactional', I gt the following failure:
    Code:
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : Because of an error, this step can't continue: 
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : org.pentaho.di.core.exception.KettleException: 
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : Error inserting row into table [MOS_TEST_LOG] with values: [Name], [-202180289], [1368769068], [-32593027], [30ki2uuss9928], [19d6vo27idcao], [0.2], [6013], [6013], [4003], [4003], [241]
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : 
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : Unable to release database transaction savepoint
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : Unsupported feature
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : 
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:429)
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:116)
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : org.pentaho.di.trans.step.RunThread.run(RunThread.java:40)
    2010/07/24 18:48:12 - Log output.0 - ERROR (version 4.0.0-stable, build 13376 from 2010-06-15 11.00.26 by buildguy) : java.lang.Thread.run(Thread.java:619)
    and my Oracle database tables are locked waiting on commit.

    Regards,
    Rasheedik

  10. #10
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi,

    Please someone help me with this! I really hope there is a solution or probably my approach is not valid.

    Regards,
    Rasheedik

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Since member and membership have no identifier (AFAICT), I would use a "Combination Lookup/Update" step to populate both tables.

    Please note that the logic is OK and Kettle works fine with it. Your database is not however.

  12. #12
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Matt,

    Thank you very much for your reply.

    I will try the "Combination step" and report back. But, I am using the migration_id (the PK of the source table) to look up whether the row exists or not.

    I have no choice but to use Oracle! We use it as our production database and we cannot easily change it.

    Regards,
    Rasheedik

  13. #13
    Join Date
    Jul 2009
    Posts
    476

    Default ETL steps are like assembly line workers

    Rasheedik,

    The ETL steps are like workers on an assembly line. The first step is the worker at the beginning of the line, and when it starts creating output for the second worker (second step), it starts sending that output immediately, so the second worker can start doing his work, too, and so on.

    Each row in ETL is like a product on the assembly line. The assembly line has lots of products (rows) that are partially done, and the workers (steps) are all working at the same time to send those products (rows) down the line (transformation). However, all products (rows) visit the workers (steps) in the same order.

    This gets more complicated when you have multiple input streams and branching, but that's something you don't need to worry about until the need arises.

  14. #14
    Join Date
    Jul 2009
    Posts
    476

    Default Hmmm - Didn't see the entire thread

    Rasheedik,

    I only saw your first post and the first reply in the thread, so my response was generic. I see that you've moved into the details of your situation, so my answer may not be very useful to you. My apologies.

  15. #15
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Rob,

    No problem at all! Your explanation about the ETL steps was very well said.

    Thanks,
    Rasheedik

  16. #16
    Join Date
    Jul 2010
    Posts
    25

    Default

    Hi Matt,

    Using the "Combination step" does work without setting the 'transactional' check. So thank you very much.

    However, in my real migration, I will need to do DB look up and add fields to the stream as I go along. My questions are:

    1. What are the circumstances in which the 'transactional' configuration has to be set?
    2. Why was I getting what seemed-to-be a race condition? Is it DB problem?


    Thanks again,
    Rasheedik

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.