Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Incremental Load best practices......

  1. #1
    Join Date
    Dec 2015
    Posts
    21

    Default Incremental Load best practices......

    Hi Everyone,

    I amtrying do incremental load. And it's type 1 dimension. The old data getsreplaced new changes.

    I amgetting the max(last_modified_ts) from the target table using "Table Input1" and pass it thro' another "Table Input 2" in where clause to
    get onlythe changed records from the source. And up to this part, it works.

    Afterthat, I need to compare the column "col_pk" from "Table Input2" result set against "col_pk" of "Target table".
    If
    TableInput 2.col_pk = target table.col_pk then compare the remaining columns in thetable to check if there was any change. If so, update that record.
    If
    TableInput 2.col_pk = target table.col_pk and when compared there is no changes toremaining columns then don't update.
    If
    TableInput 2.col_pk not found in target table.col_pk then insert into target table.

    It's kindof merge. So, I tried merge rows - diff. The flagfield for insert and updatecomes as "new" and the not changed records comes as"deleted".
    Or I can"delete" the new records from target and then "insert". Inboth cases, I am not sure, how to pass this "flagfield" to triggerthe delete and insert or
    updateand insert. Please find attached document that shows merge rows-diff setup andthe results set.

    Pleasehelp.

    Thanks,
    Raji.
    Attached Images Attached Images

  2. #2
    Join Date
    Dec 2015
    Posts
    21

    Default

    Found a way to do it. After Merge rows(diff) -> filter rows for only "new" flagfield -> then insert/update.


    And thanks for the reference.

    Thanks,
    Raji.
    Last edited by RajiR; 01-06-2016 at 01:02 PM.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Merge Rows is intended to have two incoming streams of the same data, sorted in the same way. Your process description doesn't describe that.

    Looking quickly at your PDF (not the best way to share things on here), I notice that you have both the key and the value set to the same field. That doesn't make much sense... You will limit yourself to two outcomes: it's either new, or deleted.

    You probably want to compare keys on comm_network_service_sk and values of test_incr
    This will then give you three outcomes: new, updated, or deleted.

    The step that you would want to use after that is the "Sync after merge" step, which will update the table based on the flagfield

  4. #4
    Join Date
    Dec 2015
    Posts
    21

    Default

    I have 2 incoming stream...delta records from table_input and target table using another table_input. Sorry for not being very clear and about the attachment.

    I added "test_incr" column in compare keys, but still the flagfield shows only "new" and "deleted", not showing any "updated". I am not sure if I am missing any setting inside merge rows(diff).

    Thanks for the help.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You need to remove comm_network_service_sk from the fields to compare, but leave it in the Keys column.

    Keys = How to tell if row is to be compared
    Fields = Things to compare to see if they are different
    Last edited by gutlez; 01-07-2016 at 11:29 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.