Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: How/where to write a simple update statement to update columns in tables in Pentaho?

  1. #1
    Join Date
    Dec 2015
    Posts
    21

    Default How/where to write a simple update statement to update columns in tables in Pentaho?

    Hi Everyone,

    I created a staging table called "etl_run_history" which captures load_run_id,process_name,start_ts,complete_ts,status.
    I created a transformation that loads everything except complete_ts as null and status as "started" and it works fine.

    Then I loaded the target table which uses the load_run_id from "etl_run_hisotry" (getting max of load_run_id) and it loaded without issue.

    Now, I want to update the "etl_run_history" for this record as below.
    update mdt.etl_run_history
    set complete_ts = sysdate,
    status = 'completed'
    where status = 'started';

    Not sure, how to update this table. I checked "Execute SQL script" and it's confusing...not sure what to provide for "fields to contain insert stats",etc.
    and tried insert/update...but not sure what to give for lookup info...Is there any other transformations I can use to update a table.
    Please help.

    Thanks,
    Raji.

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Why are you reinventing the wheel?

    In the transformation properties, on the Logging Tab, you can connect the Transformation to your logging table, and it will take care of setting all the data appropriately on its own.

  3. #3
    Join Date
    Dec 2015
    Posts
    21

    Default

    Thank you so much.

    I am using "file repository" now and planning to move the trans/job to "database repository" down the road, since it's working local, but once I move the process to server (in unix) it fails. I will worry about it later.

    My question is...is the "id_batch" is going to stay unique if I run using file repository now and then switch to "database repository"?

    Thanks,
    Raji.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by RajiR View Post
    My question is...is the "id_batch" is going to stay unique if I run using file repository now and then switch to "database repository"?
    That's a good question.
    I haven't tried it to know for sure, nor have I looked at the source to see how it's calculated.
    My core feeling is: Try it.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by gutlez View Post
    That's a good question.
    I haven't tried it to know for sure, nor have I looked at the source to see how it's calculated.
    My core feeling is: Try it.
    It appears from a quick review of the code, that if you configure it to continue using the same logging table, that it will continue to increment the batch_id, keeping them unique.

  6. #6
    Join Date
    Dec 2015
    Posts
    21

    Default

    The logging part works! I didn't try using database repository yet.
    Thank you so much. you are awesome!

  7. #7
    Join Date
    Sep 2014
    Posts
    175

    Default

    I have also made a custom logging solution. The out of the box one leaves a bit to be desired. One, information overload --- but at the end of the day, it's not clear how you link the 'transformation' logging to the job/ job entry 'logging' --- you think it would be something simple, but all the IDs are a messy web. Not very intuitive. And then you have to pick which step or job entry you want for the inserts, or it will take the last one ...

    And finally, if the logging screws up (your database or web connection was offline or something or a syntax error) --- there's no record of that either or error handling.

    You could do your own custom logging by having a javascript step do

    "parent_job.setVariable("JOB_SUCCESS", previous_result.getResult());parent_job.setVariable("ERROR_COUNT", previous_result.getNrErrors());
    parent_job.setVariable("LINES_WRITTEN", previous_result.getNrLinesWritten());
    parent_job.setVariable("LOG_TEXT", previous_result.getLogText());
    true;"

    You can name the environmental variables whatever you want. Put this step after the job entry you wish to log. If you want to log the entire job, make a meta-job with the job entry then this javascript step after. (There are about a dozen built in log variables you can look up somewhere).

    The next step, simply use an execute SQL step.

    "Update log_table set log_text = ${LOG_TEXT}, error_count = ${ERROR_COUNT} where log_id = ${LOG_ID}"

    If you're using a field instead of a variable for the log id, you would simply do "where log_id = ?" and put the log_id field in the parameters at the bottom and check 'execute for each row.'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.