Hitachi Vantara Pentaho Community Forums
Results 1 to 19 of 19

Thread: Use of SCD (Slowly Changing Dimension)

  1. #1
    Join Date
    May 2008
    Posts
    232

    Default Use of SCD (Slowly Changing Dimension)

    Hi All,

    I am working on data extraction through pentaho kettle in my project. All the transformations are perfectly working, but when it came to implementation of SCD through "Dimensionl lookup/update" and "Combination lookup/update" component i am stuck up.

    Type-1 SCD that is "UPDATE" alone, is working fine but when the type-2 that is "INSERT" comes into picture it starts behaving abnormally....

    According to the project requirement i have to implement both type-1 and type-2 SCD in the same table.....

    Can anyone please me to sort this out........

    thanks in advance.......

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    In that case you need the "dimension lookup/update" step.

  3. #3
    DEinspanjer Guest

    Default

    Quote Originally Posted by meenaiyer View Post
    it starts behaving abnormally...
    Need more details here. The second tab of the Dimension lookup/update should be able to handle both type 1 and type 2 columns.

    Could you post an example transformation along with specific input and output examples of what you want the dimension table to end up with?

  4. #4
    Join Date
    May 2008
    Posts
    232

    Default

    Thanks MATT ....
    but whenever i am using insert option all the rows which are already present previously, are getting append again...... with version as 2.

    Can you please tell me that whether i will have to use the "Dimension lookup/update" even for initial loading as well??

    Please give an elaborate discription....Thanks a lot

  5. #5
    DEinspanjer Guest

    Default

    Quote Originally Posted by meenaiyer View Post
    but whenever i am using insert option all the rows which are already present previously, are getting append again...... with version as 2
    That would seem to indicate that one of your key lookup tests isn't configured right. Are you maybe dealing with nulls?
    A copy of the transformation would help us figure it out.

  6. #6
    Join Date
    May 2008
    Posts
    232

    Default

    @ DEinspanjer......
    The type of dimension i am using is like Employee_Dim
    where "Employee_Id_No" and "Project_Id" is a composit primary key which is been used as lookup field, and i need "Employee_Email" and "Employee_mobile" to be updated and "Role_name" of employee in the project to be Inserted that is history should be maintained...

    And i want some other attributes like "Employee_Name" in the same dimension table ,to be untouched....i dont want any kind of SCD on those attributes to be applied. But this "Dimension lookup/update" is not allowing me to leave any of the attributes untouched.
    All the attributes has to be in "update","Insert" or "Punch through" category.....!!!!

    I am just stuck up here......please provide an elaborate solution of the above discription

    Thanks ......

  7. #7
    Join Date
    May 2008
    Posts
    232

    Default

    Hi ..
    can anyone please tell me that use of NULL values in Dimension lookup/update steps will create problem???

    i am using the attributes in "Keys" and "Fields" tab of Dimension lookup/update steps ,which contain NULL valuse , but i am not getting proper values......the number of rows which contain NULL values are getting repeated again and again....

    Is the problem is because of NULL values??

    Please help...
    Thanks..

  8. #8
    DEinspanjer Guest

    Default

    Yes null keys will cause a problem with the Dimension lookup/update step. There was a long thread here recently talking about it. You should do a search for it just for research sake.

  9. #9
    Join Date
    May 2008
    Posts
    232

    Default

    Thanks DEinspanjer.....

    I checked that thread and it was really healpful.....
    Thanks a lot again....

  10. #10
    Join Date
    May 2008
    Posts
    232

    Default

    Can someone please tell me how to do only inset, to a fact containing a surrogate key, and it should not have any extra column like "version","date_to" and "date_from"...

    Thanks..

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Without a date range, why even bother generating a surrogate key?

  12. #12
    DEinspanjer Guest

    Default

    Well for me, I want to use surrogate keys to guard against natural key changes over time, but if my dimension does not contain entries that change over time then the additional overhead of the version and date fields just feels unnatural.
    In those cases, I try to use the combination lookup/update step instead. This step might be what meenaiyer is looking for.

  13. #13
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Ah, a junk dimension, that might make sense :-)

  14. #14
    Join Date
    May 2008
    Posts
    232

    Default

    Thanks DEinspanjer...
    I think you have really understood my problem, but my trouble its not yet over. i have one attribute called "Date_Id" in the tabel, which takes the current date and load it in the table whenever that transformation is executed. That attribute can not be used as the lookup as my transformation will get loaded on the daily basis. Since the date_id obviously will not match and all the data will get loaded again even tough there is no changes.
    And combination lookup/update provides me only one feild which is "Key feild(to lookup row in table)" so here i will have to ignore "Date_Id" and i will not be able to loade the current date_id.....

    I hope you have understood my problem....
    Please give a solution for that.

    Thanks

  15. #15
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I never thought I would say this, but you might consider a trigger for that.

  16. #16
    Join Date
    May 2008
    Posts
    232

    Default

    Can you please explain what this Trigger is ?? sorry i dont have any idea about this or how it works... Can you please elaborate this..

    Thanks

  17. #17
    DEinspanjer Guest

    Default

    Hrm. I'm afraid you might be kinda stuck here as far as Kettle goes.

    The Trigger approach Matt suggested is a feature supported by many databases where you can configure SQL procedure to run whenever a particular event occurs. In this case, you could configure a procedure to insert the correct date_id into new records whenever they are inserted. Your DBA would hopefully be able to help you with that.

    Your use case now fits pretty much exactly with what I have needed several times in the past. The ability to have a dimension that is neither slowly changing nor a junk dimension.
    I prefer to use dimension tables with technical keys even when they aren't slowly changing because many of the natural keys I work with can be large and not something I'd want to have in my fact table. I also sometimes use auditing fields in dimensions.

    I've had this discussion a few times with Matt and others and they just don't see this as being a valid use case for Kettle. The answers I were given is to either make it a slowly changing dimension or don't use a technical key and use the insert/update step instead.

    My personal feeling is that a hybrid between insert/update and dimension would be useful such that:

    1. It can return technical keys in the same way that combination dimension lookup/update does
    2. It can cache data to prevent unnecessary DB IO
    3. It doesn't version records, but instead merely updates the existing record

    Then, the existing Dimension Lookup/Update step could be renamed to SCD Dimension Lookup/Update (since that is the *only* type of dimension it can operate on) and this new one could be called Simple Dimension Lookup/Update or something.

    Maybe one day I'll get around to writing it. Those steps aren't lightweight though and I haven't had the time to dedicate to the problem so far.

  18. #18
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Actually Daniel, I think there *are* various interesting things we can do to improve the current offerings.
    One interesting set of steps I would like to add would make it easier to create an organizational memory architecture. Interesting towards SOX compliance etc.
    Another feature request that is open is the ability to add all kinds of timestamps and user names. Pressed for time, can't find the JIRA case.

    I think the time is about right to push a number of steps a bit forward in 3.2. At the moment though, I have my hands full with 3.1 :-|

    Matt

  19. #19

    Default Simple Dimension Lookup/Update

    Quote Originally Posted by DEinspanjer View Post

    1. It can return technical keys in the same way that combination dimension lookup/update does
    2. It can cache data to prevent unnecessary DB IO
    3. It doesn't version records, but instead merely updates the existing record
    Hi DEinspanjer,

    That is just what I want.

    Regards,
    -yw kim.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.