Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: SCD versionning with PDI

  1. #1
    Join Date
    Jun 2009
    Posts
    6

    Default SCD versionning with PDI

    Hi,

    I'm a starter with PDI, i just wanted to know if we are consraint with PDI to use a sequenced numeric field for specifying the version of the updated dimension.

    Let's say that i am using a SCD type 2 and i would like to manage the versions of updates by using a boolean field which is gonna be "Y" for the last version and "N" for previous ones: basically at the lookup, when a modification is detected in the source datbase for the row having the natural key "a", i would like to reflect these changes in my dimension by changing the current_version field of the most recently updated dimenison row that has "a" as natural key (of course i'm using a surrogate as primary) form "Y" to "N" and then add a new dimension row for the same natural key (with a new surrogate key) that has "Y" as current_version value.

    Is that possible with PDI, because i noticed that lookup/update step is always asking for version field which is gonna be incremented everytime a new row is gonna be inserted ?

    Thank you in advance

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    This is all done automatically in the "Dimension Lookup/Update" step, including the "current_version" field (in version 3.2 in the fields tab) and the key generation.

  3. #3
    Join Date
    Jun 2009
    Posts
    6

    Default

    Thanks Matt for replying,

    but what you have suggested is giving me only a numeric versionning field while i would like to have a boolean one.

    this is my source table

    CREATE TABLE `source`.`tb_src_region` (
    `CODE` varchar(5) NOT NULL,
    `NAME` varchar(50) NOT NULL,
    `DESCR` varchar(50) DEFAULT NULL,
    `STATUS` char(1) DEFAULT NULL,
    `STATUS_DATE` date DEFAULT NULL,
    PRIMARY KEY (`CODE`)
    );

    this is my dimension table

    CREATE TABLE `target`.`tb_dim_rgn_region` (
    `RGN_ID` int(7) unsigned NOT NULL AUTO_INCREMENT,
    `RGN_CODE` varchar(5) NOT NULL,
    `RGN_NAME` varchar(50) NOT NULL,
    `RGN_DESC` varchar(50) NOT NULL,
    `RGN_VALID_FROM_DATE` date DEFAULT NULL,
    `RGN_VALID_TO_DATE` date DEFAULT NULL,
    `RGN_CURRENT_INDICATOR` char(1) DEFAULT NULL,
    PRIMARY KEY (`RGN_ID`),
    KEY `ix_rgn_code` (`RGN_CODE`)
    );

    The mapping between source and target fields is as following:

    CODE-->RGN_CODE
    NAME-->RGN_NAME
    DESCR-->RGN_DESCR

    the RGN_CURRENT_INDICATOR is meant to be a boolean type versionning flag whose values are "Y" for current versions and "N" for old ones.

    In the lookup/update step, under the "field" tab i tried to put RGN_CURRENT_INDICATOR as dimension field without any stream field to compare with and i put "version" as type of dimension update. It dosn't work since i have to specify a stream field to compare with.

    i don't wanna have any incremental numeric version field which is suggested by lookup/update step. Could this be done?

    I can provide you the transformation if you need it

    thanks
    Last edited by marouen; 07-11-2009 at 04:02 PM.

  4. #4
    Join Date
    Jun 2009
    Posts
    6

    Default Found it

    Actually i figured it out,

    On the fields tab, i can add the "CURRENT_VERSION_INDICATOR" field and specify the "type of dimension update" for it as "last version (without stream field as source)". But why is the "stream field to compare with" still to be filled in order to get the transformation running?

    The "CURRENT_VERSION_INDICATOR" is not meant to be compared with any stream field, right ?

    The second issue is how could i get rid of the "version" field which is still mandatory, or should i always have in my dimension table a field indicating the version update number (even i know it's useful but not really in such a case)?

    thanks

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    But why is the "stream field to compare with" still to be filled in order to get the transformation running
    It's not.

    Code:
    The second issue is how could i get rid of the "version" field
    You can't. Feel free to file feature request in JIRA.

    All the best,
    Matt

  6. #6
    Join Date
    Jun 2009
    Posts
    6

    Default

    Quote Originally Posted by MattCasters View Post

    Code:
    But why is the "stream field to compare with" still to be filled in order to get the transformation running
    It's not.
    In fact Matt, in the lookup/udpate step, when i let the "stream field to compare with" empty for a field A that i want to specify as last versio in the "type of dimension update", i get an error message when verifying the transformation : Error Missing fields, not found in input from previous steps

    I had to fill the "stream field to compare with" with whatsoever existing stream field to get the stransformation running

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.