Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: slowly changing dimensions - from_date

  1. #1

    Default slowly changing dimensions - from_date

    I'm implementing a removal of records from SC dimensions. Not really really a removal, but rather setting date_to to current date minus 1 minute for the records that no longer exists in the input dataset. That's working just fine until the record appears in the data set again a few days later and Dimensional Update/Lookup component actually inserts a new record with the default date_from, date_to and version fields as there were no previous records whatsoever which obviously screws up the dimension. I would have expected current date to be set as from_date and increase of the version...

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I did a few posts on the subject in the past, but let me get to the core of the issue.

    Think of the state of being deleted or not is just another attribute. Abusing the date_from field for this is not really a good idea.
    Consider adding fields such as "is_deleted" (Y/N), "deleted_date", etc.

    The meaning of the date_from field is considerably different.

    Matt

  3. #3

    Default

    Well, theory and practive are always different. While I would like nothing more than to have continuous cover for a member of a dimension (this is a real dimension used by OLAP reporting frontend) there are cases when you want to track missing records historically in the same dimensional table. Adding additional fields won't help OLAP frontend and update/lookup step to track that... My gripe is inability of having unique combination of key, version (dates could be tracked manually after all) natively from update/lookup step.
    Last edited by len0x; 12-18-2007 at 10:40 AM.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I feel no need to repeat myself. Besides, your gripes are unfounded ;-)

  5. #5

    Default

    I'm still mising a solution to a partular problem though: say every other day a record gets deleted and then inserted again. How do I track that in the dimensional table? (if its removed/added just once then yes you can have state fields etc.)

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    The [date_from,date_to[ time frame (date range) concerns the version of the dimension entry, NOT the actual values of the dimension attributes.

    Let's suppose you have this situation where a record gets deleted and undeleted somehow. Than the fact of being deleted was just a temporary state of the dimension entry. (I know I'm repeating myself)
    It's not hard at all to make a flag "is_deleted" and flag it as "insert" in the "Dimension lookup/update" operator to cause a new record to be inserted.
    The [date_from,date_to[ date range will reflect the time frame during which the record was in fact deleted.

    Again, if you want to have other attributes, you add them. All I'm saying is to not try to re-use the dimension entry validity range [date_from,date_to[

    Matt

  7. #7

    Default

    OK, I tried that and it seems to be working. Cheers.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.