Hitachi Vantara Pentaho Community Forums
Results 1 to 14 of 14

Thread: The best way to implement SCD Type I

  1. #1
    Join Date
    Sep 2007
    Posts
    6

    Default The best way to implement SCD Type I

    My question to the community - which is the best method to implement SCD Type I. Obviously there are many ways, how to do it with PDI, no one is perfect for me:

    1. Insert/Update step - I do not get surrogate key on insert.

    2. Combination lookup/update step - I need to do update of non-key attributes in next step, which does not work on my database (locking problems on SQL Server 2000, anyway I did not try PDI 3.0, maybe solved)

    3. Dimension Lookup/Update - currently my favorite but ...
    - no usage for columns version, date from/to in SCD I
    ( what about to ignore this columns in case when all
    dimension update types are 'Punch through' ? )
    - default type of dimension update is 'Insert' and I have to
    change all to 'Punch through', which is terrible work if
    you have 20+ columns in your dimension (what about
    button 'All punch through' ?)

    4. ??? Any other way, how to implement SCD I

    Thanks for any hint

    Robert

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If you don't need a technical / surrogate key, you can use insert/update or an insert strategy with error handling, very much like this: http://www.ibridge.be/?p=32

    Matt

  3. #3
    DEinspanjer Guest

    Default

    What if you do need the technical key?

    I have a dimension that is not SCD, it has a technical key that I need to retreive, and it has two other columns, a natural key that is used for lookup, and a new_entry field that is set to Y by the ETL, but after the business user processes it, they will change that value to N.

    I can't use insert/lookup even though it is the closest fit because it doesn't return TKs.
    I can't use Combination lookup/update because it would create a new entry after the existing record's new_entry field was set to N by the user.
    I don't really want to use Dimension lookup/update solely because of the extra SCD fields that would have to be put into the schema.


    Is it "weird" to have a dimension like this?

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Yes, very weird. :-)

    The only column that can be considered redundant in our version of the Slowly Changing Dimension is the "version" field. All other fields are required.

    That being said, in a strictly Type I dimension, you could also do away with the from-to date range fields.

    Well, you can use a very simple scenario where you have a lookup, a filter, an update and an insert step to solve this problem ;-)

    Matt

  5. #5
    DEinspanjer Guest

    Default

    Wow.. talk about incentive to look into patching insert/update to be able to return autogenerated keys...

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    What incentive would that be Daniel?
    Perhaps a "strictly type I" check box in "Dimension Lookup/Update" would be more appropriate?

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    Has been discussed a couple of times in the past..."it doesn't make sense to return auto-generated keys in insert/update as you need to know which ID to use to do the lookup."

    Regards,
    Sven

  8. #8
    DEinspanjer Guest

    Default

    The four steps to get around the SCD versioning/date stamping fields was the incentive I was referring to. I guess what sboden said about returning an autogen on a lookup makes sense though.

    If it wasn't too much work, I do think that some way of turning off the type II fields for "strictly type I" would be a good solution.

  9. #9
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well being a devil's advocate, suppose if you can encounter the situation where you have multiple natural keys and/or if you have natural keys that are non-numerical in nature.
    If that's the case AND if you only want to have a "Type I" dimension, then it's cleaner to add a technical key to the dimension to be placed in the fact table(s).

    Well, even in that case I would still recommend to add the date-range fields etc. The problem is that the decision to go fully "Type I" is in the best case made by end-users who can change their minds.
    In the worst case that decision is made by an IT dept unaware of the opinion of the end-users ;-)

    Matt

  10. #10
    Join Date
    Apr 2008
    Posts
    9

    Default

    This is an old thread, but I'm currently dealing with the issues at the core of it.

    Matt, you seem to be arguing that Type I SCDs should not even exist and that you know what DW implementors want better than they do. I find that arrogant. Besides, if Type I shouldn't exist, Kimball would not have documented it.

    I would like to see better support for Type I SCDs that does not require me to put version and date range columns on my dimension. Insert/Update doesn't work well for me because it doesn't provide the max+1 logic for the surrogate key. Replacing each dimension step with a "lookup, a filter, an update, [select max value, JavaScript to increment sequence,] and an insert" as you suggest is not a viable option.

    Please consider improving SCD Type I support.

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I don't say I necessarily disagree with you but generally speaking it's usually not a good strategy to call someone arrogant to get something done for free.
    So I would suggest filing a JIRA case or going through your enterprise edition support channel.

    Just to be clear on this: simply putting an identity (or auto-increment) column on a dimension and then using the Insert/Update step would work fine.
    If you have a database that doesn't support that look into Getting a unique ID from a slave server.

  12. #12
    Join Date
    Mar 2006
    Posts
    170

    Default

    Quote Originally Posted by MattCasters View Post
    ...generally speaking it's usually not a good strategy to call someone arrogant to get something done for free.
    Classic :-)

  13. #13
    Join Date
    Apr 2008
    Posts
    9

    Default

    Matt, I apologize for my tone, and I certainly did not intend to convey that you are arrogant in general.

    I plan to file at least one JIRA case, and perhaps even submit patches.

    The way I would like this step to work is that if all update fields in the step are set to "Update" or "Punch through" then the version and date range fields become optional.

    I'd also like in the Type II case to be able to specify version, date range, or both. Currently both are required but they are not both necessary to implement Type II.

    Can you give me some feedback on my thoughts?

  14. #14
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Ken, I'm looking forward to seeing the both the JIRA case and the patches.
    Thanks in advance!
    Matt

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.