Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Dimension table SCD2 with 'First seen in file' field

  1. #1
    Join Date
    Feb 2009
    Posts
    296

    Default Dimension table SCD2 with 'First seen in file' field

    I'm generally using a SCD type 2 (insert) technique to fill my dimension table. Now I would like to add a field that is only checked on the first insert, but not on updates. It is no key field and it does not contribute to the attributes.

    My example is the name of the source file that provided the first information about the dimension entry.

    Example:

    file1.csv
    Code:
    CustomerID | Customer Name
    1               | Frank
    2               | Alexander
    file2.csv
    Code:
    CustomerID | Customer Name
    2               | Alexandra
    3               | Maria
    file3.csv
    Code:
     CustomerID | Customer Name
     2               | Alexandra
    The resulting dimension table should look like this:
    Code:
    ID | CustomerID | CustomerName | FirstSeenInFile | CurrentFlag | Version
    0   |                 |                       |                     |                  | 1
    1   | 1              | Frank               | file1.csv         | Y              | 1
    2   | 2              | Alexander         | file1.csv         | N               | 1
    3   | 2              | Alexandra         | file2.csv         | N               | 2
    4   | 3              | Maria               | file2.csv         | Y                | 1
    As you can see the third file does _not_ create a new record, as 'Alexandra' is already known.

    I can't do this with the 'update'-type as it would change the record of ID 3 (changing the first seen to a LastSeenInFile).

    Does any one have an idea how to accomplish it? I was considering an execute SQL step but am unsure as to how to find the records that need updating.
    Fabian,
    doing ETL with his hands bound on his back

  2. #2
    DEinspanjer Guest

    Default

    What happens if you select that field as an insert only field? Does it misbehave?

  3. #3
    Join Date
    Feb 2009
    Posts
    296

    Default

    It would create a new dimension record which is not what I want as all the attributes are the same and the keys match. It's only this meta information that differs.
    It should basically work like the "date of last insert" type of update. It's just not the date I want it's the content of the field 'filename'.
    Fabian,
    doing ETL with his hands bound on his back

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.