Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Defining transaction boundraries

  1. #1

    Default Defining transaction boundraries

    Hi, I was wondering if there was a way to define a transaction boundary in kettle. For example, I want to update some records, insert new records and delete some records as a unit of work or I will have inconsistent data.

    Thanks.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    You can put the whole of a transformation in a transaction (usually). Else the steps define when commits occur.

    Regards,
    Sven

  3. #3

    Default

    Thanks - how do I put a transformation in a transaction?
    Can I do the same for a job ?

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    For a transformation... go to transformation properties (ctrl-t) and in the miscellenaous tab set the unique connection on. In all of the database steps you use use a commit size of 0 or either a very very big one.

    In jobs... no, but there are not many database job entries.

    In general ETL tools more like static environments.

    Regards,
    Sven

  5. #5

    Default

    Hi, I am again faced with the transaction problem and thought I would ask again in case any new features have been added. I am using Kettle 2.5.1 now, and would consider 3.x if it adds value.

    My transformation goes as follows
    1. Query temp table
    2. Update table 1
    3. Insert into table 2
    4. Update table 3
    i.e. for each record in the source I need to see that steps 2,3, 4 are performed under a single transaction. While the idea of placing the entire transformation into a transaction (single connection), I am concerned that it could fail as my source table could have 300K records and as i understand it the entire transformation (all actions) fall under a transaction. i.e. my understanding is that steps (2, 3, 4) * number of records would be part of a single transaction. I am hoping that my understanding of the problem is incorrect.

    An alternate idea that I read about is to create a SQL Script with the update, insert, update statement.

    I would prefer not to use the SQL Script as it becomes less manageable.

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    You're not making much sense... you want the steps in 1 transaction, but you're afraid off the effects when they're in 1 transaction... you can't have both.

    Put all functionality in 1 transformation, use single connection, and put the transaction size to 0 (or a very big number) and all falls in 1 transaction... don't and it doesn't

    Regards,
    Sven

  7. #7

    Default

    Sven, I want each record that goes through steps 2, 3, 4 to be part of a transaction - i.e. for a single record, steps 2, 3, 4 must commit or fail.

    Code:
    for each record
      being transaction
        1. update
        2. insert
        3. update
      end transaction
    However, I have >300K records that are processed as part of the transaction. I am concerned that if the whole transformation is part of a single transaction the transaction may time out.

  8. #8
    Join Date
    May 2006
    Posts
    4,882

    Default

    Well, that's not the way PDI works (not in 2.5, still not in 3.0 or 3.1).

    You could start up a new transformation per input row but that would be silly.

    Regards,
    Sven

  9. #9

    Default

    I am doing something similar with larger volumes of data, and the method that Sven has described works fine.

    PS

    Im inserting into an Oracle table

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.