Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Two identical subsequent Table inputs give different results (one is <null>)

  1. #1
    Join Date
    Aug 2016
    Posts
    290

    Default Two identical subsequent Table inputs give different results (one is <null>)

    A transformation is updating a table.

    One step is run before to read checksum. Another step is run after to read checksum after possible changes to the table.

    Name:  Untitled.jpg
Views: 34
Size:  7.1 KB

    But whenever the two checksum steps are run, the last one always returns null!

    Any help?

    Steps to reproduce:

    1) CHECKSUM TABLE test_table;
    2) INSERT INTO test_table...
    3) CHECKSUM TABLE test_table;

    Step 3 will return <null> instead of the checksum.
    Last edited by Sparkles; 01-18-2017 at 11:22 AM.

  2. #2
    Join Date
    Jul 2009
    Posts
    476

    Default

    Doing a checksum on an entire table doesn't make sense in most transformations. A transformation takes a stream of rows from an input step and runs them through a series of steps until they reach the output step. (That's admittedly a simplification.)

    I wouldn't try to checksum the table for every row that passes through the transformation. My guess is that you have a database locking issue.

    Why are you doing this?

  3. #3
    Join Date
    Aug 2016
    Posts
    290

    Default

    Doing a checksum on an entire table makes perfect sense in this transformation. This transformation takes a stream of exactly one row - the checksum - and runs that one row through the subsequent steps until reaching the output step.

    This transformation's only goal is to checksum the table before and after insertion for the only single row that is passed through.

    Usually when I have a database locking issue, I get an error message specifying exactly this (from Mysql).

    I am doing this in order to detect changes to a dimension table whenever it is updated.

    If a change is noticed, then etl will make a http post to mdx server to flush cache, or else the change will never be registered by the mdx server.

    I could either do this small operation in one single transformation with the use of 4-5 steps.

    Or I would have to make a root job with one step per transformation (checksum before, sql update of dim, checksum after, evaluation/comparison of checksums, then http post) which would be minimum 1 job and 3-4 transformations.

    In my case, I'd prefer just a single transformation and a single file doing this, instead of spreading it out in 4-5 files, job and transformations.

    If I'm not able to do a checksum of the table before and after updating it, then the entire etl process would be useless.
    Last edited by Sparkles; 01-19-2017 at 05:00 AM.

  4. #4
    Join Date
    Jul 2009
    Posts
    476

    Default

    If I understand correctly, you are doing the entire dimension update in your Update_MCC_MNC SQL Script step, and that step might update multiple rows (or none). You might try changing the "Checksum after" step from a Table Input step to a Database Join step, and then see if you get the second checksum.

  5. #5
    Join Date
    Aug 2016
    Posts
    290

    Default

    Thanks for suggestion, will try this.

    I was able to get the same functionality by moving everything up to job level. Then doing the checksum in a transformation before and after the sql update step (possible updating a table).

    The result checksum was written to variable and then used in a "Simple Evaluation" step to compare before and after.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.