Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: problem with Dimension lookup/update feature

  1. #1
    Join Date
    Jan 2016
    Posts
    3

    Exclamation problem with Dimension lookup/update feature

    I'm new to this forum and to Pentaho, so first of all hi everybody!


    I'm using Pentaho Data Integration General Availability Release 6.0.0.0-353 on top of a Virtual Machine with Windows Server 2008 R2.
    PostgreSQL 9.4 is the database used to store both Kettle repository and the DB I'm willing to read/load.
    I defined an ETL process which populates target tables through Dimension lookup/update feature.
    Most of the times everything works great: when an input record matches a natural key already present in the target table, Kettle inserts a new record or updates the old one depending on the defined insert/update criteria.
    It happened two times that Kettle wasn't able to recognize that the natural key was already present in the target table, so it inserted new records even if not needed, bypassing the insert/update criteria and without closing the old ones.



    example:


    source table:


    id_empl, dsc_empl
    10, 'John'


    target table:


    seq_empl, id_empl, desc_empl, startdate, enddate, version
    1, 10, 'John', 01/01/2016, 31/12/9999, 1




    Kettle reads the source table and inserts the record in the target table, even if the natural key id_empl = 10 is already there and nothing changed.
    This is the result:

    target table:


    seq_empl, id_empl, desc_empl, startdate, enddate, version
    1, 10, 'John', 01/01/2016, 31/12/9999, 1
    2, 10, 'John', 01/01/2016, 31/12/9999, 2




    I tried to disable caching but this didn't work.
    Is this a known issue?
    Is there any workaround?
    This happening in Production would be a serious issue.


    Thank you in advance!


    Francesco

  2. #2
    Join Date
    Jul 2009
    Posts
    476

    Default

    If you can attach a sample transformation that demonstrates the problem, that would help. You might want to change the database connection information before sharing it, because it can contain usernames and passwords.

  3. #3
    Join Date
    Jan 2016
    Posts
    8

    Default

    Hi,

    I encountered exactly the same type of problem. I have approximately 1,5m rows in 75 csv files and there are handful of records that causes similar issues for two different dimension tables. The files nor the records are not the same that causes the problem for each table.

    Help would be appreciated!

  4. #4
    Join Date
    Jan 2016
    Posts
    3

    Default

    Hi robj

    it's not a problem of a single transformation, it happened a few times on different environments with different transformations.
    I'm attaching a sample transformation (kettle transformation + postgreSQL tables).

    Thanx

    Francesco
    Attached Files Attached Files

  5. #5
    Join Date
    Jul 2009
    Posts
    476

    Default

    Francesco,

    I looked at your transformation. In the Dimension Lookup/Update step, the Codi_Comune column is your natural key. Desc_Comune, Codi_ISTAT, Sequ_TPR, and Codi_Provincia are your Lookup/Update fields. Your Type of Dimension Update for those 4 fields is "Insert," so if any of those field values are changed, the step will create a new row, even if the value of Codi_Comune is unchanged. That might be what you are seeing. If not, then I don't know why the step is creating new rows.

    Rob

  6. #6
    Join Date
    Jan 2016
    Posts
    3

    Default

    Quote Originally Posted by robj View Post
    Francesco,

    I looked at your transformation. In the Dimension Lookup/Update step, the Codi_Comune column is your natural key. Desc_Comune, Codi_ISTAT, Sequ_TPR, and Codi_Provincia are your Lookup/Update fields. Your Type of Dimension Update for those 4 fields is "Insert," so if any of those field values are changed, the step will create a new row, even if the value of Codi_Comune is unchanged. That might be what you are seeing. If not, then I don't know why the step is creating new rows.

    Rob

    Hi Rob

    the transformation inserts new records for the same Codi_Comune natural key even if none of the fields


    • Desc_Comune
    • Codi_ISTAT
    • Sequ_TPR
    • Codi_Provincia


    changes.
    It just occured a few times in Dev environment, that's why I think this might be a bug.
    This happening in Production would be a serious issue.

    Regards,
    Francesco

  7. #7
    Join Date
    Jul 2009
    Posts
    476

    Default

    Sounds like it's probably a bug. Two more things you might check:

    (1) Are any system settings changes causing the varchar columns look different, even though they are the same? I'm thinking of things such as changing your system's language or collation. I've rarely had to deal with anything other than regular ASCII characters, but I know sometimes language settings can make the same string values appear to be different.

    (2) You could trace your SQL calls in the Postgres DB to see exactly how the transformation is trying to compare the rows in the stream with the existing dimension data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.