Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Incremental load.

  1. #1
    Join Date
    Jun 2014
    Posts
    7

    Post Incremental load.

    Hi,

    Currently i am using truncate load concept....using an external table.
    I want to make my data load incremental (insert new records and update the old ones)
    How can this be achieved in most efficient way.....so that it should not take more time than truncate load concept as data is very Large.

    please help.
    Thanks in advance

  2. #2
    Join Date
    Mar 2014
    Posts
    28

    Default

    Hi,
    Use insert/update component to add the new records and update the existing one.

    Regards,
    Bhanu

  3. #3
    Join Date
    Nov 2013
    Posts
    341

    Default

    Insert/Update is usually extremely slow for large data sets as every single row requires a lot of overhead (sql "get" to check if it's there + insert or update).

    If you can rely on some non duplicate key try insert + error control to update only the offending rows. Test it and monitor performance.

  4. #4
    Join Date
    Mar 2014
    Posts
    181

    Default

    Look at the Merge Rows (diff) step. it might be helpful.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,448

    Default

    Quote Originally Posted by Black-hawk View Post
    I want to make my data load incremental (insert new records and update the old ones)
    Do you have an Insert Timestamp and an Update Timestamp?
    If you do, then the usual method is:
    Transform 1:
    Table Input 1 (Select MAX(Insert Timestamp) from DestinationTable ) -> Table Input 2 ( Select * from SourceTable where Insert Timestamp > ? ; Insert Data from Table Input 1) -> Table Output (Insert Rows into DestinationTable)
    Transform 2:
    Table Input 1 (Select MAX(Update Timestamp) from DestinationTable ) -> Table Input 2 ( Select * from SourceTable where Update Timestamp > ? ; Insert Data from Table Input 1) -> Table Update (Update Rows in DestinationTable)

    Put Transform 1 and Transform 2 into a job so that Transform 2 runs after Transform 1 completes.

  6. #6
    Join Date
    Jun 2014
    Posts
    7

    Default

    Hi,

    thank you guys.......it was very useful for me

  7. #7
    Join Date
    Feb 2013
    Posts
    530

    Default

    How can it be done on fact table ? Generally we don't store any timestamp field in it, so what could be the way to achieve on fact table ?

    Let's say : A,B,C,D are fields in an excel file where "D" is the only measure. dumped the same into staging area ? (Should we add another field (timestamp) while doing truncate load into staging ?
    1st day excel file : 100 rows
    2nd day excel file : new 100 rows
    How can we make a ID start from 101 in fact table ?

    I am new to warehouse concepts and learning . Could you please share your inputs on this ?

    Thank you.
    and etc.
    All the
    - Sadakar Pochampalli

  8. #8
    Join Date
    Apr 2008
    Posts
    4,448

    Default

    Sadakar,

    That's not related to the thread.
    Please do not hijack threads for a different issue - it makes it a lot harder to understand the conversation. Please start a new thread for your topic.

  9. #9
    Join Date
    Feb 2013
    Posts
    530

    Default

    I'm sorry if I made this conversation harder with my question to understand the thread but I think my question is related to the same incremental loading concept.

    Anyhow, I am stopping posting on this particular thread and opening a new.

    Thank you :-) :-)
    - Sadakar Pochampalli

  10. #10
    Join Date
    Jul 2016
    Posts
    17

    Default Simple and Perfect !

    Quote Originally Posted by gutlez View Post
    Do you have an Insert Timestamp and an Update Timestamp?
    If you do, then the usual method is:
    Transform 1:
    Table Input 1 (Select MAX(Insert Timestamp) from DestinationTable ) -> Table Input 2 ( Select * from SourceTable where Insert Timestamp > ? ; Insert Data from Table Input 1) -> Table Output (Insert Rows into DestinationTable)
    Transform 2:
    Table Input 1 (Select MAX(Update Timestamp) from DestinationTable ) -> Table Input 2 ( Select * from SourceTable where Update Timestamp > ? ; Insert Data from Table Input 1) -> Table Update (Update Rows in DestinationTable)

    Put Transform 1 and Transform 2 into a job so that Transform 2 runs after Transform 1 completes.

    Works well, is simple to do. Thanks a Ton !

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2017 Pentaho Corporation. All Rights Reserved.