Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Delete From Target before loading

  1. #1

    Default Delete From Target before loading

    For various reasons, I have concluded that I need in this particular case to do things in this order:

    1. Extract data in extract transformation from source between date_a and date_b.
    2. In Load transformation, first delete data from target for data between date_a and date_b.
    3. In Load transformation, do an insert of all data extracted from (1).

    So basically want to do like if I checked the "truncate table" option on the target table, except do a delete based on a date range which is stored in variables.

    Previous developer had put the delete in another transformation using a "Delete" step, but when it is in another transformation it is a different transaction so there is a gap where data in the target is missing.

    I just need to know what the best way to do the delete and then insert in the same transaction so there is no gap in data. An upsert isn't going to work here....

    Thanks,

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Have a look at the "Merge Rows (Diff)" step.
    So long, and thanks for all the fish.

  3. #3

    Default

    Ultimately that might be of some use but doesn't solve the fundamental problem, since it just passes data onto the next step. I would have to still delete the data in the target.

    The way I see it these two things won't work:
    1. Put delete and src-tgt stream in same transformation-they will run in parallel and that isn't what I want (maybe though block until steps finish would help?)
    2. Put them in two different transformations--they will execute in separate transactions.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by urbanmojo View Post
    Ultimately that might be of some use but doesn't solve the fundamental problem, since it just passes data onto the next step. I would have to still delete the data in the target.
    Well, it's a classical pattern from my own problem domain, so I figured it could help you, too.
    Attached Images Attached Images  
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Jul 2009
    Posts
    476

    Default

    If you use a DBMS that supports partitioning, and your "date_a" and "date_b" ranges are fairly regular, such as weekly, monthly or yearly, then you might consider partitioning your target table in the DBMS. The target table name in the Table Output or Bulk Loader step would include the partition name as a parameter, so instead of just "target_table" it would be "target_table_${partition}", where the "partition" parameter is passed into your transformation. You could then check the Truncate table option, and only the partition would be truncated before your load your data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.