Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: How to Control Points

  1. #1
    Join Date
    Jun 2007
    Posts
    476

    Default How to Control Points

    Hi all, i'm starting to write an ETL, and my client is asking that we have control points every step where we send a query to a DB. What we need is to check if the query run succesfully, and if there was a DB error, we need to catch that error, send an e-mail to the responsible and stop the process, so they can check this, correct and re-run the job, but it has to start at the Control Point where it ended

    hope someone can help me with this

  2. #2

    Default

    Hello!

    Your task is not an easy one.

    In general, you can add an error handling to each step that touches the database (insert, update, ...). Right-click on the step and click "Define error handling".

    It is impossible to define a general process for "fixing the problem and then reexecuting the job from that point on".

    You can use a single transaction per database connection per transformation so in many cases, you changes will be rolled back if a step fails. But it can be very hard to roll back the changes that were made up to this step in the job.

    This is the schema of my data warehouse filling process:

    1. Check if the process is running, mail + exit if true
    2. Update internal dimensions (date, ...), mail + exit if failed
    3. Read fact tables to staging area, mail + exit if failed:
    3a In the read transformation, output from multiple databases is merged in a Sort step. This way, I can be sure that either all the databases could be read or the process didn't write anything into the DWH staging area (because the Sort step blocks until it received its last input).
    4. Read dimension tables to staging area, mail + exit if failed
    4a Just like 3a
    5. Store dimension changes in the DWH
    6. Store fact changes in the DWH

    Data are read from the transaction systems to the staging area. From the staging area, it gets deleted after it was inserted successfully into the normal data warehouse area.

    The whole process is designed so that filling the staging area is possible even it contains data (after a previous error, for example) and filling the DWH from the staging area is transactional, so it either works correctly or leaves the data for reinserting.

  3. #3
    Join Date
    Jun 2007
    Posts
    476

    Default

    Quote Originally Posted by balazsb View Post
    The whole process is designed so that filling the staging area is possible even it contains data (after a previous error, for example) and filling the DWH from the staging area is transactional, so it either works correctly or leaves the data for reinserting.
    The main problem we have is that our hole ETL process can do up to even TERAS, so if i can't have the hole process run for a second time if the previous one crashed at the last sql statement!!!

    What i thik i will be doing is something like this: have a "status table" where i insert the Transformation name and the step and when the step succeds i update this table to the next step and every time i go into a control point i check this table to see if this step has already executed and skip it if so.

  4. #4

    Default

    Hello,

    Quote Originally Posted by rhaces View Post
    What i thik i will be doing is something like this: have a "status table" where i insert the Transformation name and the step and when the step succeds i update this table to the next step and every time i go into a control point i check this table to see if this step has already executed and skip it if so.
    That's part of our solution. In the status table, we store the ID of the source table which was read last and the last read time. Subsequent imports only read records that were added or modified later (change time > last read time OR source ID > stored ID for this table). If your source data allow this, you can go this path.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.