US and Worldwide: +1 (866) 660-7555
+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 13

Thread: loading many-to-many?

  1. #1
    Join Date
    Oct 2008
    Posts
    22

    Default loading many-to-many?

    i am in need of some general guidance on how to load a bridge table in a dwh. if i understand all the literature around this (and i think! i do), we want to avoid an actual bridge table in the oltp sense whereby we end up having a unique bridge key on the fact table. this gives the appearance of a 1:M and is helpful for BI tools. okay - so i'm on board, but totally at a loss for how to load such a thing.

    here is my problem statement:

    this is a dwh of real estate properties where one sales agent has many properties and a property may be associated to one or more sales agents. (see my (generalized) model attached.)

    what just bends my brain is how do i formulate a unique key for each unique group of agents coming in once we disassociate from that which defines that group (the property). what would be the order of operations? ugh. in the kimball etl tool kit the recommendation is

    lookup the group
    if a match is found, insert group key in the fact table
    if no match is found create a new group and add the existing group table, then insert the group key into the fact table.

    i'm sure there are combination lookup/updates and db look ups here - but what is the order of operations and what would be the steps.

    any guidance is helpful, including alternate model proposals. thank you in advance
    Attached Images

  2. #2

    Default

    Take a look at the Closure Generator step....
    Al.

  3. #3
    Join Date
    Mar 2008
    Posts
    101

    Default

    There are different types of bridge tables.
    The closure generator handles one case, but not the other.
    See http://jira.pentaho.com/browse/PDI-1595

    -- Peter

  4. #4
    Join Date
    Oct 2008
    Posts
    22

    Default

    this is not a hierarchy - do you still recommend this closure generator step?

    after a couple weeks of scattered research, i finally found an article today that describes the etl process involved in this particular bridge concept. the method uses set logic. just need to translate this to pentaho now. you have to register for dmreview to get this article http://www.dmreview.com/dmdirect/200...0001968-1.html

  5. #5

    Default

    Quote Originally Posted by kvogelan View Post
    this is not a hierarchy - do you still recommend this closure generator step?

    after a couple weeks of scattered research, i finally found an article today that describes the etl process involved in this particular bridge concept. the method uses set logic. just need to translate this to pentaho now. you have to register for dmreview to get this article http://www.dmreview.com/dmdirect/200...0001968-1.html
    Forgive me for saying so but it seems that the data model you posted complicates things way too much.

    In my opinion, using a type 2 SCD seem better suited as it would allow you handle versions of a entry combined with a date range (from/to).

    I am personally not crazy about bridge tables. Will BI reports (canned or adhoc) be created against that data? It is going to be painful for the report writer(s) to come up with a workable solution.....and the users are going suffer as the reports will be long-running. Using a bridge table will box you (the modeler/architect) in: you will have to ETL the data out and push it somewhere else in a form that is more report-friendly.

    I take back what I said about the closure generator. Take a look at the "Dimension lookup/update" step instead.

    Enough of my ramblings. My 2 cents.
    Al.

  6. #6
    Join Date
    Oct 2008
    Posts
    22

    Default

    no problem, i was hoping to hear another opinion on this design. its not mine - tis kimball and i have seen a number of other articles with the same data structures, like http://www.pythian.com/blogs/364/imp...ta-warehousing and the other link i mentioned. it certainly complicates the etl process (whoa!), but i am taking it on b/c i understood that it actually simplifies data access for bi tools b/c you end up with a 1:M from the dims to the facts. you are saying otherwise. i will think more carefully about what you have said. what might you do differently?

  7. #7
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    The relationship between SCD and fact is always 1:M, very true.

    A bridge table often expresses a relationship between 2 dimensions and as such you always get a fact table as a result.
    If there are no other facts involved, the fact table is often described as a "factless fact table".

    In the specific case of hierarchical/recursive data structured like Company/Branch etc this is no different.
    You just have 2 dimensions linked with a fact and a third dimension (junk often) expressing the relationship.

    That in itself often doesn't always make it easy enough to report on. In a lot of cases you therefor create derivative or analytical fact tables that look for key metrics like the top 2 levels and the lowest level. With a fixed number of levels to report on, you can indeed report easily on the hierarchical data.

    That is all in the general sense and of-course based on some personal experience. I didn't really go into your specific case.
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

  8. #8
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    It would be a sad day if I were to put up a picture on my blog like this one :



    Please don't follow ill-given advice like that. Remember: express relationships with fact tables (even if they are fact-less) in your data warehouse.
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

  9. #9

    Default

    I do think it's worth considering Matt's response of considering whether a fact table can serve the need you're trying to address. For example, does it make sense to have a "listing" fact table that expresses the event that a property is listed by a realtor, perhaps on a date?

    However, I've used bridge tables where there was a legitimate m:m relationship between facts and dimensions. The classic example is healthcare. It's very natural to have an episode/encounter fact table in a healthcare data warehouse. But a single patient episode can have >1 symptom, >1 treatment, etc.

    It is difficult to load a bridge table like this. I didn't try to read all the DMReview article that was referenced, but part of the logic I created was to define natural keys along the lines of the concatendated string information described in this article. Kettle has soem useful transforms for this type of thing, such as row normaliser/denormaliser, and the "concatenate strings" aggregation function in the Group By transform.

    --Jeff Wright
    ThotWave Technologies

  10. #10
    Join Date
    Nov 1999
    Posts
    7,251

    Default

    I did one for a bank years ago.

    Dimension 1 : account
    Dimension 2 : customers

    A customer can have multiple accounts
    An account can be owned by multiple customers.

    That being said, there is a certain relationship being expressed every time:
    - owner
    - beneficiary
    - co-owner
    - has permission to access the account
    - etc.

    That relationship is then modeled with the 2 dimensions linked via a fact table together with a third dimension describing the relationship. (codes, descriptions, etc, typically a small junk dimension)

    Once I had the low-level data modeled I got questions from our users like : "I only want to see customers with at least 1 non-empty account".
    You solve this by creating a new analytical fact table that uses the "bridge/fact-less" fact table to solve the question.

    What you are doing in that process is refine the data, lift it to higher level. That's when you are building a data warehouse.

    Cheers,
    Matt
    Matt Casters, Chief Data Integration
    Pentaho, Open Source Business Intelligence
    http://www.pentaho.org -- mcasters@pentaho.org

    Author of the upcoming book Pentaho Kettle Solutions by Wiley. Release date: mid-September 2010.

    Join us on IRC server Freenode.net, channel ##pentaho

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts