Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Ascending Incremental load on db key

  1. #1

    Default Ascending Incremental load on db key

    I have a remote db table where each row has a key starting at 1 and counting up. I want to periodically pull data from this table to a local table.

    I would like to be able to pull the new rows only, using the fact that the new rows will be any rows with keys higher that the highest one i already have.

    Is there a mechanism available in kettle to do this which does not require loading all the data?

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You can link Table Input steps together.

    1) Table Input: Select MAX(ID) from LocalTable
    2) Table Input: Select * from RemoteTable where ID>? (Insert Data from 1)
    3) Table Output: All Fields into LocalTable

    This is something that Kettle excels at...

  3. #3

    Default

    I have two transformations, one for getting the max id the other for getting the data and writing it to the local db, these are being called in order in a job.

    I am using a setvariables step to save the max id, which appears from the logs to work:
    2009/07/01 13:42:26 - Set Variables.0 - Set variable maxID to value [85]

    In this step: 2) Table Input: Select * from RemoteTable where ID>? (Insert Data from 1)
    I am trying to specify the variable which was set in the previous transformation using: SELECT * FROM RemoteTable WHERE ContactID>${maxID}
    This does not work, if i replace ${maxID} with 85 it works. Am i specifying the variable incorrectly or do i need to do it a different way for table input?
    Last edited by theirishmole; 07-01-2009 at 08:49 AM.

  4. #4

    Default

    I had not checked "replace variables in script", with that checked the incremental load now works

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    There are (at least) two ways to do what you wanted...

    The "Select * from RemoteTable where ID>?" way is if you want to have two table inputs directly attached to each other in a single transform (no Variable setups needed)

    The other way is two use two transforms (as you did) setting a variable, and then using the "Select * from RemoteTable where ID>${MAXID}" query. You are correct that in that setup, you need the "Replace Variables" option set.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.