Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Error column in job log history

  1. #1
    Join Date
    Mar 2015
    Posts
    190

    Default Error column in job log history

    Hi

    i am using pdi ce 5.3 with MySQL database, as per pdi job log detailed table structure we have column name as ERRORS, when my job successfully ran then 0 will be stored but few times it is taking null value even though my job is successful.

    How can i avoid this issue.

    Thank you
    Attached Images Attached Images  

  2. #2
    Join Date
    Mar 2015
    Posts
    190

    Default

    Finally i have identified my issue and i hope below is the solution.

    While running ALL ETL JOBS (job containing other jobs), log details are updated only for MAIN JOB and partially for other jobs inside this job. As status are not updated for other jobs there is a problem for incremental load, that is it will consider all the records from date 1900-01-01 to till date, this will take time and meaningless. There are some Kettle Log table attribute on which full-load and incremental load is dependent on.

    Example: Kettle Log table attributes STARTDATE, ENDDATE, STATUS, ERRORS.

    When we run ALL JOBS STARTDATE, ENDDATE, STATUS and ERRORS attributes are updated, where as jobs inside ALL JOBS are updated for only STARTDATE, ENDDATE and STATUS attributes AND it is placing (null) value for ERRORS column.

    To avoid extracting and loading full data all the time, we need to update Kettle Log table attribute ERRORS = null to ERRORS = 0 when STATUS = ‘end’. This update query should be run before every incremental load.

    UPDATE JOB_LOG SET ERRORS = 0 WHERE STATUS = 'end' AND ERRORS IS NULL.

  3. #3

    Default

    Do you have any idea why Kettle is not updating JOB LOG TABLE columns.

  4. #4
    Join Date
    Aug 2011
    Posts
    360

    Default

    Quote Originally Posted by ranala View Post
    Finally i have identified my issue and i hope below is the solution.

    While running ALL ETL JOBS (job containing other jobs), log details are updated only for MAIN JOB and partially for other jobs inside this job. As status are not updated for other jobs there is a problem for incremental load, that is it will consider all the records from date 1900-01-01 to till date, this will take time and meaningless. There are some Kettle Log table attribute on which full-load and incremental load is dependent on.

    Example: Kettle Log table attributes STARTDATE, ENDDATE, STATUS, ERRORS.

    When we run ALL JOBS STARTDATE, ENDDATE, STATUS and ERRORS attributes are updated, where as jobs inside ALL JOBS are updated for only STARTDATE, ENDDATE and STATUS attributes AND it is placing (null) value for ERRORS column.

    To avoid extracting and loading full data all the time, we need to update Kettle Log table attribute ERRORS = null to ERRORS = 0 when STATUS = ‘end’. This update query should be run before every incremental load.

    UPDATE JOB_LOG SET ERRORS = 0 WHERE STATUS = 'end' AND ERRORS IS NULL.
    Hi,

    As I already stated in another post, I think this is a really bad idea to use the kettle log system as a CDC date tracking system.
    The main reason is that your date tracking is then tied to your implementation, and not to the logical units your are loading.
    Then suppose you just change the name of job, bam it will start again at 1900-01-01.
    More bad, you cannot use generic jobs with metadata injection for example, since the job name will be the same
    for every source load.

    So i realy recomand you to build dedicated table and CDC systems that are tracking extract/load dates of
    logical units. Like "mySource1_myTable1", "mySource2_myTable2" or name by components etc.

    What we did at work was to build generic sub jobs/trans that do the following:
    - register a tracking logical name on a table, and init load date if not exists.
    - lookup a tracking name to get all info (and set as variables like cdc_startDate)
    and log the start of job in a table (keep track of start/end status with dates)
    - finally, log end status and update cdc tracking dates if successfull.

    Then you have a main job template with a job entry using variables in the middle to get the real
    job to execute.
    Then you pass in parameters to set the job to execute and the name of cdc object to track

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.