Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Implementing CDC while reading from excel file

  1. #1
    Join Date
    Nov 2015
    Posts
    4

    Post Implementing CDC while reading from excel file

    Hi everyone,

    I want to implement CDC while reading data from excel file. I have idea how to do it in the case of table input. But did'nt get a clue in the case of excel files.Kindly provide some help.

    Thanks

  2. #2
    Join Date
    Dec 2009
    Posts
    332

    Default

    If you are comparing the data from two excel files that have the same columns and there is an appropriate unique key, you could do CDC by using a Merge Rows (Diff) and send the results of that to a Synchronize after Merge step somewhat like this:
    Name:  PDIMergeSync.jpg
Views: 60
Size:  9.9 KB

    If there is no unique key, you could merge on the row number, but this will likely result in a cascade of updates. The results will be tagged when they differ - but if you add a new row, every row below that will be treated as altered.

  3. #3
    Join Date
    Nov 2015
    Posts
    4

    Default

    Quote Originally Posted by khelms View Post
    If you are comparing the data from two excel files that have the same columns and there is an appropriate unique key, you could do CDC by using a Merge Rows (Diff) and send the results of that to a Synchronize after Merge step somewhat like this:
    Name:  PDIMergeSync.jpg
Views: 60
Size:  9.9 KB

    If there is no unique key, you could merge on the row number, but this will likely result in a cascade of updates. The results will be tagged when they differ - but if you add a new row, every row below that will be treated as altered.
    Thanks Khelms for the help, but what i want to do is ,whenever i run ETL i will load some data from excel file. And for the next time when i will run ETL i want the new data based on the date field in my file. Like we do incremental load in the case of database tables.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by Ehtesham View Post
    I want to implement CDC while reading data from excel file. I have idea how to do it in the case of table input.
    With Table-Input you can use a WHERE clause to let the database engine keep old data from the result set.
    With Excel-Input you'll have to add a Filter-Rows step.
    It's that easy.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Nov 2015
    Posts
    4

    Post Incremental load from a file

    Hi everyone,

    I want to load data from excel file. My excel file contains a date field. I want to load data on the basis of date i.e whenever ETL run it should always load new records based on that date field.

    Kindly provide some help.

    Best Regards.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.