PDA

View Full Version : 1 dimension table, 2 fact tables: Cartesian Product Problem



diddy
08-13-2009, 09:15 AM
Hi,
I have created a metadata model for all our fact tables and dimensional tables.

I've created one ad-hoc report and joined one dimensional table with two tact tables. The dimensional table stores all unique users, the first fact table all logins and the second fact table all status updates. My ad-hoc report looks at all the users, the sum of their logins and the sum of their status updates.
The problem is that the result of the sums is way too high. If I do the joins seperately, I get the correct results.

I had a look at the SQL that the Metadata Editor creates. The problem is not with the Metadata Editor, but more with the SQL query itself. In the Metadata Editor I define a 1:N INNER JOIN between the user table and the logins table and 1:N INNER JOIN between the user table and the fact table. The SQL creates a cartesian product, as it tries to join all the results of the first join with the second join.

What would I have to change in the Metadata editor, so that this works properly?

Thanks,
Diddy

wgorman
08-13-2009, 09:53 PM
Can you give an example of the SQL it is generating, and what you'd expect it to generate? That would help me understand the scenario better.

Thanks,

Will

diddy
08-14-2009, 04:39 AM
Hi Will,
Thanks for your reply!
This is a simplified version (where I expect the problem to be):

SELECT
BT_SN_USER_SN_USER.msisdn AS COL0
,SUM(BT_SN_LOGIN_SN_LOGIN.sessions) AS COL1
,SUM(BT_SN_MINI_STORIES_SN_MINI_STORIES.events) AS COL2
FROM
sn_user BT_SN_USER_SN_USER
INNER JOIN
sn_login BT_SN_LOGIN_SN_LOGIN
ON
( BT_SN_USER_SN_USER.msisdn = BT_SN_LOGIN_SN_LOGIN.msisdn )
INNER JOIN
sn_mini_stories BT_SN_MINI_STORIES_SN_MINI_STORIES
ON
( BT_SN_MINI_STORIES_SN_MINI_STORIES.msisdn = BT_SN_USER_SN_USER.msisdn )
GROUP BY 1


This is the whole version (this is the original query generated by the Metadata editor):


SELECT
BT_SN_USER_SN_USER.msisdn AS COL0
,SUM(BT_SN_LOGIN_SN_LOGIN.sessions) AS COL1
,SUM(BT_SN_MINI_STORIES_SN_MINI_STORIES.events) AS COL2
FROM
sn_login BT_SN_LOGIN_SN_LOGIN
,sn_lt_event_type BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE
,sn_lt_time_dimension BT_SN_LT_TIME_DIMENSION_EVENTS
,sn_lt_time_dimension BT_SN_LT_TIME_DIMENSION_LOGIN
,sn_mini_stories BT_SN_MINI_STORIES_SN_MINI_STORIES
,sn_user BT_SN_USER_SN_USER
WHERE
( BT_SN_LOGIN_SN_LOGIN.login_date_id = BT_SN_LT_TIME_DIMENSION_LOGIN.time_dimension_id )
AND ( BT_SN_USER_SN_USER.msisdn = BT_SN_LOGIN_SN_LOGIN.msisdn )
AND ( BT_SN_MINI_STORIES_SN_MINI_STORIES.event_id = BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE.event_id )
AND ( BT_SN_MINI_STORIES_SN_MINI_STORIES.event_date_id = BT_SN_LT_TIME_DIMENSION_EVENTS.time_dimension_id )
AND ( BT_SN_MINI_STORIES_SN_MINI_STORIES.msisdn = BT_SN_USER_SN_USER.msisdn )

AND (
BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE.event_name = 'STATUS UPDATE'
)
AND (
BT_SN_LT_TIME_DIMENSION_LOGIN.the_date >= '2009-07-01'
)
AND (
BT_SN_LT_TIME_DIMENSION_LOGIN.the_date <= '2009-07-31'
)
AND (
BT_SN_LT_TIME_DIMENSION_EVENTS.the_date >= '2009-07-01'
)
AND (
BT_SN_LT_TIME_DIMENSION_EVENTS.the_date <= '2009-07-31'
)
GROUP BY
BT_SN_USER_SN_USER.msisdn
;

If I would write a standard SQL query in the same manner, I would run into the same problem as well. The question is just, how can I avoid it with the Metadata Editor. I guess that I have to specify a special relationship.

These are the relationships for the 3 main tables that I have specified so far are:

BT_SN_USER_SN_USER -> INNER JOIN 1:N -> BT_SN_MINI_STORIES_SN_MINI_STORIES
BT_SN_USER_SN_USER -> INNER JOIN 1:N -> BT_SN_LOGIN_SN_LOGIN

... and then there are the other relationships with the other dimensional tables (which are all INNER JOIN 1:N as well).

All works will if I do only join BT_SN_USER_SN_USER to BT_SN_MINI_STORIES_SN_MINI_STORIES
or
BT_SN_USER_SN_USER to BT_SN_LOGIN_SN_LOGIN
but
joining all three
BT_SN_USER_SN_USER to BT_SN_MINI_STORIES_SN_MINI_STORIES to BT_SN_USER_SN_USER to BT_SN_LOGIN_SN_LOGIN
creates the cartesian product.

BT_SN_USER_SN_USER: dimensional table
BT_SN_LOGIN_SN_LOGIN: fact table
BT_SN_MINI_STORIES_SN_MINI_STORIES: fact table

Please let me know in case you need some more information.

Best regards,
Diddy

diddy
08-18-2009, 04:32 AM
I have finally figured out a way to get a correct result in SQL. The trick is to handle the SQL results as table data and then join these fake tables (so you run the two queries independently and join them together). Now the questions is how to model this in the metadata editor:



SELECT
a.msisdn,
a.sessions,
b.events
FROM
(
SELECT
BT_SN_USER_SN_USER.msisdn,
SUM(BT_SN_LOGIN_SN_LOGIN.sessions) AS sessions
FROM
sn_login BT_SN_LOGIN_SN_LOGIN
,sn_lt_time_dimension BT_SN_LT_TIME_DIMENSION_LOGIN

,sn_user BT_SN_USER_SN_USER
WHERE
( BT_SN_LOGIN_SN_LOGIN.login_date_id = BT_SN_LT_TIME_DIMENSION_LOGIN.time_dimension_id )
AND ( BT_SN_USER_SN_USER.msisdn = BT_SN_LOGIN_SN_LOGIN.msisdn )
AND (
BT_SN_LT_TIME_DIMENSION_LOGIN.the_date >= '2009-07-01'
)
AND (
BT_SN_LT_TIME_DIMENSION_LOGIN.the_date <= '2009-07-31'
)
GROUP BY 1
) a
INNER JOIN
(
SELECT
BT_SN_USER_SN_USER.msisdn,
SUM(BT_SN_MINI_STORIES_SN_MINI_STORIES.events) AS events
FROM
sn_lt_event_type BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE
,sn_lt_time_dimension BT_SN_LT_TIME_DIMENSION_EVENTS
,sn_mini_stories BT_SN_MINI_STORIES_SN_MINI_STORIES
,sn_user BT_SN_USER_SN_USER
WHERE

( BT_SN_MINI_STORIES_SN_MINI_STORIES.event_id = BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE.event_id )
AND ( BT_SN_MINI_STORIES_SN_MINI_STORIES.event_date_id = BT_SN_LT_TIME_DIMENSION_EVENTS.time_dimension_id )
AND ( BT_SN_MINI_STORIES_SN_MINI_STORIES.msisdn = BT_SN_USER_SN_USER.msisdn )
AND (
BT_SN_LT_EVENT_TYPE_SN_LT_EVENT_TYPE.event_name = 'STATUS UPDATE'
)
AND (
BT_SN_LT_TIME_DIMENSION_EVENTS.the_date >= '2009-07-01'
)
AND (
BT_SN_LT_TIME_DIMENSION_EVENTS.the_date <= '2009-07-31'
)
GROUP BY 1
) b
ON
a.msisdn = b.msisdn


Does somebody have an idea on how to model this in Metadata Editor?
Thanks,
Diddy