US and Worldwide: +1 (866) 660-7555
Results 1 to 3 of 3

Thread: What is the significance of "DEPDATE " & " REPLAYDATE" in the ETL Log Table ?

  1. #1
    Join Date
    Sep 2005
    Posts
    1,404

    Default What is the significance of "DEPDATE " & " REPLAYDATE" in the ETL Log Table ?

    Hi Matt,


    Just looking for :



    What is the significance of "DEPDATE " & " REPLAYDATE" in the ETL Log Table ?



    I guess what do they track and how can you use them?



    Thanks

  2. #2
    Join Date
    Nov 1999
    Posts
    9,689

    Default RE: What is the significance of "DEPDATE " & " REPLAYDATE" in the ETL Log Table ?

    DepDate: Dependency date: it helps with the calculation of the date-range (Start Date Range, End Date Range in step "Get System Info")

    In the transformation window you can set a list of dependency fields. If any of these fields have a maximum date higher than the dependency date of the last run, the date range is set to to (-oo, now)

    The use-case is the incremental population of Slowly Changing Dimensions (SCD).
    Let's say you load 1000 customers from the source table S_CUSTOMER in dimension DIM_CUSTOMER. Each customer a category. The description of the category, you load this from another table S_CATEGORY. This category is a Type I field.
    Now, suppose you load only the have been modified/added since the last run. (say 5 customers)
    In that case, when a category description changes, you need to modify not only the category description of the 5 customers, but also those of the other 995.
    IF, and only IF you have all customers present in the source system, you can open up the date range to allow all customers to be updated.

    And that's what the dependency date DEPDATE is for. See also: Transformation Settings, Dependencies tab.

    The replay date is used to indicate that the transformation was replayed (re-tried, run again) with that particular replay date. (run-date) You can use this in Text File/Excel Input to allow you to save error line numbers into a file (SOURCE_FILE.line for example) During replay, only the lines that have errors in them are passed to the next steps, the other lines are ignored.

    This is for the use case: if the document contained errors (bad dates, chars in numbers, etc), you simply send the document back to the source (the user/departement that created it probably) and when you get it back, re-run the last transformation.

    OK, this is probably more than you bargained for :-)

    Cheers,

    Matt
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the book Pentaho Kettle Solutions by Wiley. Also available as e-Book and on the Kindle reading applications (iPhone, iPad, Android, Kindle devices, ...)

    Join us on IRC server Freenode.net, channel ##pentaho

  3. #3
    Join Date
    Sep 2005
    Posts
    1,404

    Default RE: What is the significance of "DEPDATE " & " REPLAYDATE" in the ETL Log Table ?

    Matt,

    Lot's of detail, thanks!

    I get what's going on and will look at getting these in use if the situation arises.

    Many Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •