US and Worldwide: +1 (866) 660-7555
+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 11 to 13 of 13

Thread: loading many-to-many?

  1. #11
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    Jeff, it sounds like you need multiple fact tables to model patient/treatment, patient/symptom, patient/doctor, ... relationships.
    Remember that a dimension model is subject oriented.

    Besides these things all I can say is: you can't hide complexity. Don't try to hide relationships if they're worth expressing :-)

    Cheers,
    Matt
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

  2. #12

    Default

    Matt, I'm in agreement about not hiding complexity. Sometimes the use of a bridge table for a multi-valued dimension is a useful approach for modeling the complexity. For a fuller explanation, see Ralph Kimball's article at

    http://www.dbmsmag.com/9808d05.html

    --Jeff Wright
    ThotWave Technologies

  3. #13
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    Database designers usually take one of four approaches to this kind of open-ended, multivalued situation:
    1. Disqualify the Diagnosis dimension because it is multivalued
    2. Choose one value (the "primary" diagnosis) and omit the other values
    3. Extend the dimension list to have a fixed number of Diagnosis dimensions
    4. Put a helper table in between this fact table and the Diagnosis dimension table.
    From the article it sounds like Mr. Kimball is not really a fan of either 4 solutions and neither am I. He disqualifies #3 as being "a hack" but how can it be a hack if it's a user requirement? :-) OK, it's a very specif case but if you have a set of patient-symptom relationships, I'm sure you can model them with a fact table. Go ahead and call that a "helper table" all you want but since it's referencing dimensions and expressing relationships I think it can stick.
    Although the helper table clearly violates the classic star join design where all the dimension tables have a simple one-to-many relationship to the fact table, there is no avoiding the issue of what to do with multivalued dimensions that designers insist on attaching to a fact table.
    I thought it would never happen but I actually disagree :-)

    Fortunately, designers rarely insist on attaching a multivalued dimension to a set of measurements.
    That makes sense since you would have a set of separate records and it's still as hard as on the OLTP system to perform any meaningful analysis on it.

    On a related topic we actually found a way for the OpenMRS project to solve it using #3. The idea is to perform an analysis of the source system (every day for example) and to dynamically change the ETL, the dimensions and facts involved and even the metadata that gets exposed to the clients, on the fly. Since Pentaho can handle dynamic ETL and auto-generated metadata it's very possible.
    The reason for doing this is that the attributes are stored in key/value tables and even the data types are "dynamic". I wouldn't call it "a hack", it's just technology that Mr. Kimball didn't consider back then :-)

    Cheers,

    Matt
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts