Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Best Practices for a "Current Flag" in a SCD type 2 dimension?

  1. #1
    Join Date
    May 2006
    Posts
    151

    Question Best Practices for a "Current Flag" in a SCD type 2 dimension?

    Hi!

    I'm curious what people are using to implement a "current" flag in type 2 slowly changing dimensions. (By "current" flag, I mean a "yes/no" type of column that marks the last version of the row, basically to make it easier to build reports that show only the current state of affairs).

    I have tried to implement this based on the "Dimension lookup / update" step, and I finally ended up with the following solution:

    1) use a constant step to add the flag values 'is_current" (1) and "is_not_current" (0) to the stream
    2) use the "Dimension lookup / update" step to do an insert for changes in the SCD type 2 fields
    3) In the same "Dimension lookup / update" step, set the is_current flag in the dim table to 1 using "update" for the type of dimension update.
    4) right after the "Dimension lookup / update", use a lookup step to fetch (by key) the version number of the last version of the row
    5) with a calculator step, decrease the looked up version number by one to get the version number of the previous version
    6) set the current flag to 0 for the previous version using an update step (using the "natural key" + the version number generated at 5)

    Now, this solution works but I am not very happy with it because the lookup in step 4 feels unnecessary - I mean, i get the feeling I should be able to get the version field directly from the "Dimension lookup / update" step (because just like the surrogate key, it is generated by the "Dimension lookup / update" step).

    So, my question is, what do people do to implement this? I hope there is a more clever method than what I came up with.

    Thanks in advance, and kind regards,
    kind regards,

    Roland Bouman
    http://rpbouman.blogspot.com/
    @rolandbouman on twitter
    Author of "Pentaho Solutions" (Wiley, ISBN: 978-0-470-48432-6)
    Author of "Pentaho Kettle Solutions" (Wiley, ISBN: 978-0-470-63517-9)

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Those people would probably be best off using either of your proposed solutions or ... version 3.2.0.

    ;-)

  3. #3
    Join Date
    May 2006
    Posts
    151

    Thumbs up

    Hi Matt!

    thanks for your quick reply.

    Wow, so PDI 3.2.0 provides native functionality for this feature? That's awesome!

    I've been playing a bit with 3.2.0 RC1, and I see a few extra options in the Dimension Lookup/Update step, in particular there seem to be a few new "Type of dimension update" options.

    The "Date of last insert or update" and "Last version" look promising. I tried to use them but I couldn't figure out what their behaviour is exactly. Perhaps I didn't try hard enough, but can you perhaps provide a quick explanation how to use it? (that is, assuming these are the right options for this job)
    kind regards,

    Roland Bouman
    http://rpbouman.blogspot.com/
    @rolandbouman on twitter
    Author of "Pentaho Solutions" (Wiley, ISBN: 978-0-470-48432-6)
    Author of "Pentaho Kettle Solutions" (Wiley, ISBN: 978-0-470-63517-9)

  4. #4

    Default

    i am facing this situation soon too and thought about a different solution
    (which just exists in my mind yet :P)

    - set the current_flag in the dimension-update in all records to false including the new version
    (the third type of update in the list)
    - use the new surrogate key of the just inserted row for an update of the current_flag to true in an update step after it
    (i think the new surrogate key is in the output list of the dimension-update step)

  5. #5
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Perhaps a simple view with a "case|if|when" construct in it would do the trick as well.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Roland, give me a week or so, I'd rather document it properly on the wiki. The various options are in any case pretty straightforward. The timestamp options are not the last to be included either. The ones included now cover the "trigger" replacement use-case but plenty of others exist.

  7. #7
    Join Date
    May 2006
    Posts
    151

    Wink

    Matt,


    thanks! Pls take your time - not trying to rush you.
    kind regards,

    Roland Bouman
    http://rpbouman.blogspot.com/
    @rolandbouman on twitter
    Author of "Pentaho Solutions" (Wiley, ISBN: 978-0-470-48432-6)
    Author of "Pentaho Kettle Solutions" (Wiley, ISBN: 978-0-470-63517-9)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.