Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Design Question: Best way to `derive` foreign keys??

  1. #1

    Question Design Question: Best way to `derive` foreign keys??

    Gurus,

    Assume you have a star schema database.

    a) Every day, you have a Kettle transformation that extracts from an OLTP database the measure for your fact table (e.g. SALESREVENUE).

    b) You have 4 dimension tables that use surrogate keys for PK (a_la_Ralph_Kimball).

    c) Since the dimension tables of your star schema have a 1:M relationship with the fact table, how would go about deriving the foreign key values that will be inserted in your fact table with your measure?

    Thanks in advance for your input.

    Al.

    PS: Yes, it is my first star schema so my question is likely a no-brainer for some of you.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    Usually it's something like this:

    1) load the dimensions.
    2) replace the business-keys in the fact by lookups with the "FK/surrogate id". 1 lookup per dimension table. Insert fact.

    There are other ways... it all depends... in some cases you can let step 1 return the surrogate id of the dimension table. But I like the above

    Regards,
    Sven

  3. #3

    Default

    Responding to your mention of 1:M, keep in mind that one of the key principles of a dimensional data model is that there is exactly 1 dimension row per fact row. This avoids some common reporting errors:

    - If some fact rows are missing a corresponding dimension rows, reports that join the fact to the dimension with an inner join will lose facts with no dimension row.

    - If some fact rows have >1 dimension, reports that join the fact to the dimension will "double count" some facts.

    There are some advanced dimensional modeling techniques that allow you model 1:M fact:dimension relationships, but those are the exception, not the rule.

    --Jeff Wright
    ThotWave Technologies

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    Quote Originally Posted by jwright View Post
    Responding to your mention of 1:M, keep in mind that one of the key principles of a dimensional data model is that there is exactly 1 dimension row per fact row. This avoids some common reporting errors:
    ??? strange ... 1 dimension row per fact row is more an exception than a key principle.

  5. #5
    DEinspanjer Guest

    Default

    I think maybe he just used a bit of a confusing wording.
    I believe he is saying that every FK in the fact table should reference not less than and not more than one record in the dimension table. That is indeed a key principle for standard dimensional modeling.

  6. #6

    Default

    Quote Originally Posted by DEinspanjer View Post
    I think maybe he just used a bit of a confusing wording.
    I believe he is saying that every FK in the fact table should reference not less than and not more than one record in the dimension table. That is indeed a key principle for standard dimensional modeling.
    Let's say that today, I sold 1 widget that is listed in my PRODUCT dimension under surrogate key value "10". I propagated "10" as FK value in the fact table for a SALES REVENUE of $XY.

    Tomorrow, I will sell 1 the same widget with the same surrogate key value. I will propagate the same FK value to my fact table for a SALES REVENUE of $XYZ.

    Bottom-line: there can be N occurrences of a FK value in a fact table but one and only one occurence of the "parent" record in a dimension table.

    Did I understand this correctly??

  7. #7
    DEinspanjer Guest

    Default

    Quote Originally Posted by acbonnemaison View Post
    Bottom-line: there can be N occurrences of a FK value in a fact table but one and only one occurence of the "parent" record in a dimension table.

    Did I understand this correctly??
    Yes. As mentioned above, the reason for this rule is to prevent over-counting if the dimension has more than one record with the key 10 and to prevent under-counting if the dimension has no record with the key 10 (in which case, an inner join would throw the fact record out)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.