Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Periodic Processing of new records

  1. #1
    Join Date
    Sep 2005
    Posts
    13

    Default Periodic Processing of new records

    Hi,

    Is PDI the right tool for what I want to do? I would appreciate any input on this!

    Requirement
    To transfer every 5 minutes the records added to a table in one database to a table in another database where a different application process them.

    I thought PDI would help dealing with the scheduling and making sure that only the new records get transfered buy I have not found a way to manage this.

    What I tried to do:
    1) The transaction table has an adddate column so I am trying to filter the transaction records based on the adddate used for the last upload and the newest adddate at the time the new upload process starts. I store the last adddate used to upload in table LastUpload. This table has only one record with the "LastUploadDate"
    2) I get the latest adddate to use with "select max(adddate) from transaction table"
    3) I store these dates as variables

    .... and I am stuck.... I have not been able to use these variables to get the transaction records to transfer. I tried using the variables in the SQL of an Input Transformation or in a SQL Job step.

    Please let me know how to use the variables to filter the transaction table or let me know if there is an easier procedure.

    Thanks!

    Alberto.

    PS. I attach the transformation I developed.
    Attached Images Attached Images  

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    - You can't use variables in the same transformation where you use them
    - You can only set a variable once per transformation (you can of course call transformations multiple times)

    Make a transformation that sets your variables... my guess would be that you already have that one. Make a new transformation to load your data from source to target. Make a job to run both in sequence.

    If you would be using Oracle as source and the source is active while you extract your data you have a whole new range of problems.

    Regards,
    Sven

  3. #3
    Join Date
    Sep 2005
    Posts
    13

    Default How do I call the variables in the SQL statement?

    Hi Sven,

    Thanks for your reply.

    >> Make a new transformation to load your data from source to target. Make a job to
    >> run both in sequence.

    I have a Job, but I have been unable to "call" the variables in the new transformation. I tried to use an Input table transformation and have the SQL where clause use the variables but I have not found an example on how to do that.

    Thanks,

    Alberto

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    Just switch on "replace variables in script" on the bottom of the table input and use something as ${VAR} in your sql (where VAR is the name of your variable).

    Or use get variable step and pass the "variable" as input from a previous step (this I would do if you want to use the variable as an argument in the where clause).


    Make sure you set the set variable use "up to the root" mode. And if you use multiple sub jobs (sub job of jobs) you may have problems in 2.5

    Regards,
    Sven

  5. #5
    Join Date
    Sep 2005
    Posts
    13

    Smile It works Great!! Now, which are the gotchas with Oracle?

    Sven,

    Thanks a lot ! I followed your instructions and it works perfectly.

    Since on one of your posts you mentioned that with Oracle there were other problems "lurking" I would appreciate it if you could expand a bit on this since in a couple of months the system will be moving precisely to that platform.

    Thanks again for your help,

    Alberto

  6. #6
    Join Date
    May 2006
    Posts
    4,882

    Default

    For the Oracle problem:

    - Don't think a database is just a database... you can't just move from 1 database to the other and expect everything will work the same. Moving from something to Oracle has caused many people serious headaches... Oracle is cleaner qua design than most other databases, but ... E.g. something that locks rows on DB2, can happily continue on Oracle, on Oracle 2 transactions only lock each other when they update the same row, reading a row never locks. If your application depends on that locking it will work on DB2, but will fail in strange ways on Oracle.

    - Specifically for the CCD problem I mentioned before (and bitten once by it myself). If transactions are still running on Oracle while you extract data, it may happen that you miss rows. E.g whenever a row changes, you update a timestamp in the row with a trigger (poor man's ccd). Your ETL extracts data based on that timestamp to get only the not yet processed data. A data processing transaction is running (but hasn't committed yet), your ETL runs and extracts data and finishes, then the data processing transaction commits and you may get a timestamp earlier than your latest maximum extracted timestamp. The next time your ETL runs it won't pick up this last row (because it has an earlier timestamp).... hence missing rows.
    It's described in other places on the internet with more examples.... the "real" solution is log mining but that's not an "easy fix" just to move from 1 database to Oracle.

    Regards,
    Sven

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.