US and Worldwide: +1 (866) 660-7555
Results 1 to 7 of 7

Thread: 1:many relationship

  1. #1
    Join Date
    Dec 2013
    Posts
    4

    Default 1:many relationship

    I have a situation where Saiku sometimes displays an unexpected/undesirable value, until I re-run the exact same calculation.

    It is where a fact table has a 1:many relationship with a dimension table.

    FACT TABLE:
    Code:
     customer_id | purchase_date | purchase_value
    -------------+---------------+----------------
          1      |       1       |        2          -- Customer 1 spent £2 on Day 1
          1      |       2       |        6          -- Customer 1 spent £6 on Day 2
          2      |       2       |        4          -- Customer 2 spent £4 on Day 3
    1:Many Dimension Table:
    Code:
     customer_id | sweet_ordinal | sweet_name
    -------------+---------------+------------
          1      |       1       |  Chocolate
          1      |       2       |  Toffee           -- Customer 1 has preferences for both Chocolate and Toffee
          2      |       2       |  Toffee           -- Customer 2 has a preference for just Toffee
    


    This is represented by the following XML Schema (everything is in-line so no need to create any tables)...

    Code:
      <!-- Cube definitions -->
      <Cube visible ="true" name="ONE TO MANY TEST" description="ONE TO MANY TEST" cache="true" enabled="true">
    
        <!-- Source Table -->
        <View alias="fact_indulgences">
          <SQL dialect="generic">
            <=!=[=C=D=A=T=A=[SELECT 1 AS [customer_id], 1 AS [purchase_date], 2 AS [purchase_value]
                     UNION ALL
                     SELECT 1 AS [customer_id], 2 AS [purchase_date], 6 AS [purchase_value]
                     UNION ALL
                     SELECT 2 AS [customer_id], 2 AS [purchase_date], 4 AS [purchase_value]]=]=>
          </SQL>
        </View>
    
        <!-- Dimensions -->
        <Dimension name="Sweety Preference" visible="true" foreignKey="customer_id" highCardinality="false">
          <Hierarchy name="Sweety Preference" allMemberName="All Preferences" allLevelName="(All Preferences)" hasAll="true" primaryKey="customer_id">
            <InlineTable alias="Sweety Preference">
              <ColumnDefs>
                <ColumnDef name="customer_id"      type="Numeric"/>
                <ColumnDef name="sweet_ordinal"    type="Numeric"/>
                <ColumnDef name="sweet_preference" type="String"/>
              </ColumnDefs>
              <Rows>
                <Row>
                  <Value column="customer_id"     >1</Value>
                  <Value column="sweet_ordinal"   >10</Value>
                  <Value column="sweet_preference">Chocolate</Value>
                </Row>
                <Row>
                  <Value column="customer_id"     >1</Value>
                  <Value column="sweet_ordinal"   >20</Value>
                  <Value column="sweet_preference">Toffee</Value>
                </Row>
                <Row>
                  <Value column="customer_id"     >2</Value>
                  <Value column="sweet_ordinal"   >20</Value>
                  <Value column="sweet_preference">Toffee</Value>
                </Row>
              </Rows>
            </InlineTable>
            <Level name="Sweety Preference" visible="true" column="sweet_ordinal" ordinalColumn="sweet_ordinal" nameColumn="sweet_preference" uniqueMembers="false" type="String" levelType="Regular" hideMemberIf="Never"/>
          </Hierarchy>
        </Dimension>
          
        <Dimension name="Purchase Day" visible="true" foreignKey="purchase_date" highCardinality="false">
          <Hierarchy name="Purchase Day" allMemberName="All Days" allLevelName="(All Days)" hasAll="true" primaryKey="day_id">
            <InlineTable alias="Days">
              <ColumnDefs>
                <ColumnDef name="day_id"      type="Numeric"/>
                <ColumnDef name="day_ordinal" type="Numeric"/>
                <ColumnDef name="day_name"    type="String"/>
              </ColumnDefs>
              <Rows>
                <Row>
                  <Value column="day_id"     >1</Value>
                  <Value column="day_ordinal">1</Value>
                  <Value column="day_name"   >Monday</Value>
                </Row>
                <Row>
                  <Value column="day_id"     >2</Value>
                  <Value column="day_ordinal">2</Value>
                  <Value column="day_name"   >Tuesday</Value>
                </Row>
              </Rows>
            </InlineTable>
            <Level name="Purchase Day" visible="true" column="day_id" nameColumn="day_name" ordinalColumn="day_ordinal" uniqueMembers="true" type="String" levelType="Regular" hideMemberIf="Never"/>
          </Hierarchy>
        </Dimension>
    
        <!-- Measures -->
        <Measure name="Total Sales Value" column="purchase_value" datatype="Numeric" aggregator="sum"            visible="true">
        </Measure>
    
        <Measure name="Unique Customers"  column="customer_id"    datatype="Numeric" aggregator="distinct-count" visible="true">
        </Measure>
    
      </Cube>

    I have written two MDX queries that I run in Saiku.

    1. Sales per Day, by Sweet Preference
    Code:
    SELECT
      NON EMPTY
        Hierarchize(
          [Purchase Day].[Purchase Day].Members
        )
      ON COLUMNS,
      NON EMPTY
        Hierarchize(
          {[Sweety Preference].[Sweety Preference].Members}
          *
          {[Measures].[Unique Customers], [Measures].[Total Sales Value]}
        )
      ON ROWS
    FROM [ONE TO MANY TEST]
    2
    . Sales per Day
    Code:
    SELECT
      NON EMPTY
        Hierarchize(
          [Purchase Day].[Purchase Day].Members
        )
      ON COLUMNS,
      NON EMPTY
        Hierarchize(
          {[Measures].[Unique Customers], [Measures].[Total Sales Value]}
        )
      ON ROWS
    FROM [ONE TO MANY TEST]

    The results that I get, however, depend very much on the order that I run things. I'm not sure if this reflects a bug in Mondrian/Saiku, if it reflects Mondrian not supporting my many:many relationship, or if it reflects something being missing from my schema.

    1). To replicate this behaviour, I stop and re-start my Saiku Server before any of these steps.

    2). I run my first query (Sales per Day, by Sweet Preference) and get the "correct results"
    Code:
     Sweety Preference | MeasuresLevel     | Monday | Tuesday 
    -------------------+-------------------+--------+---------
     Chocolate         | Total Sales Value |   2    |    6       <- Only Customer has a preference for Chocolate.
                       | Unique Customers  |   1    |    1          So, only Customer 1 data appears here
    -------------------+-------------------+--------+---------
     Toffee            | Total Sales Value |   2    |   10       <- Both customers have a preference for Toffee
                       | Unique Customers  |   1    |    2          So, both Customers' data appears here
    
    3). I run my second query (Sales per Day) , but I do not get the results I desire
    Code:
     MeasuresLevel     | Monday | Tuesday 
    -------------------+--------+---------
     Total Sales Value |   4    |   16      <- This is the sum on the Chocolate and Toffee data
     Unique Customers  |   1    |    2
    
    What I desire, however, is [ Total Sales Value | 2 | 10 ]. (Although Customer1 has a preference for two sweeties, I would like for their data to only be included once not twice.)

    4). From Saiku I refresh my cubes.

    5). Now I run my second query again. And I get the results I wanted.
    Code:
     MeasuresLevel     | Monday | Tuesday 
    -------------------+--------+---------
     Total Sales Value |   2    |   10
     Unique Customers  |   1    |    2
    
    6). Just to check, I run my first query again, and then my second query again. No change, it's all fine.

    7). I keep checking behaviour, and I refresh my cubes again.

    8). I run my first query, and all is fine.

    9). I run my second query, and I get the "wrong" results.

    10). I run my second query again, and I get the right results!


    It's that last part that confuddles me.

    > I can understand that the "wrong result" is coming from aggregating already cached fine-grain results. So, if I run the coarse-grain query first it gives the answer I want. At-least, until those results drop out of the cache.

    > What I don't get is how I can engineer a situation where by I get the "wrong result" (step 9) then run exactly the same query again and get the "right" result (step 10).


    Have I stepped over some line and am asking for behaviour that is not supported? Is the behaviour supported, but I have not adequately represented my needs in the Schema? Or is it just impossible ot get what I want?



    Cheers,
    Mat.

    (This has all been repeated with a much larger data-set, in real tables in a RedShift-PostGreSQL database. Ths simplified In-Line example is just for demonstration and repeatability.)



    I appear to have found a really painful work around.

    By creating dummy and hidden dimensions, I can test the level in the hierarchy and force aggregation and the fine grained level of one of the dummy dimensions. This forces the value to be calculated from the database, rather than by aggregating from the cached fine grained values, at the cost of introducing a redundant `WHERE dummy_id = 1` and a redundant `GROUP BY dummy_id`

    Source Data and Two New Redundant Dimensions (The value is always 1):
    Code:
        <!-- Source Table -->
        <View alias="fact_indulgences">
          <SQL dialect="generic">
            <=!=[=C=D=A=T=A=[SELECT 1 AS [customer_id], 1 AS [dummy_id], 1 AS [dummy_id2], 1 AS [purchase_date], 2 AS [purchase_value]
                     UNION ALL
                     SELECT 1 AS [customer_id], 1 AS [dummy_id], 1 AS [dummy_id2], 2 AS [purchase_date], 6 AS [purchase_value]
                     UNION ALL
                     SELECT 2 AS [customer_id], 1 AS [dummy_id], 1 AS [dummy_id2], 2 AS [purchase_date], 4 AS [purchase_value]]=]=>
          </SQL>
        </View>
    
        <!-- Dimensions -->
        <Dimension name="Dummy" visible="false" foreignKey="dummy_id" highCardinality="false">
          <Hierarchy name="Dummy" allMemberName="All Dummies" allLevelName="(All Dummies)" hasAll="true">
            <Level name="Dummy" column="dummy_id" uniqueMembers="true"/>
          </Hierarchy>
        </Dimension>
    I can then use these redundant dimensions in a calculated member:
    Code:
        <Measure name="Total Sales Value" column="purchase_value" datatype="Numeric" aggregator="sum"            visible="true">
        </Measure>
    
        <CalculatedMember name="Total Sales Value 2" dimension="Measures" visible="true">
          <Formula>
            IIF(
              NOT([Sweety Preference.Sweety Preference].Level IS [Sweety Preference.Sweety Preference].[Sweety Preference]),
              AGGREGATE(
                [Dummy.Dummy].[Dummy].Members,
                ([Measures].[Total Sales Value], [Sweety Preference.Sweety Preference].CurrentMember.Hierarchy)
              ),
              [Measures].[Total Sales Value]
            )
          </Formula>
        </CalculatedMember>
    Concerns:
    - I would need an additional dummy dimension for every additional multi-value dimension in my data.
    - All measures would need an additional "wrapper" CalculatedMeasure for every additional multi-value dimension in my data.
    - This is a hack, a more intellegent future mondrian engine may see this trick and optimise it away



    There must be a better way?
    Last edited by MatBailie; 12-17-2013 at 06:44 AM. Reason: More concrete Example & Later Findings

  2. #2
    Join Date
    Aug 2013
    Posts
    4

    Default

    Hi Matt - I loaded your schema into Pentaho Schema Workbench and was able to successfully publish this to my Pentaho BI Server with the inline views. I just had to fix the weird CDATA text that came across in the forum posting above.

    Everything seems to work according to your expected results in Mondrian and our OLAP query tool, "Analyzer". However, it looks like you are using the partner community tool for OLAP reporting called "Saiku".

    You can submit an issue request to find out what is going here directly to Saiku's GitHub issues site: https://github.com/OSBI/saiku/issues.

    Best regards, Mark Kromer

  3. #3
    Join Date
    Dec 2013
    Posts
    4

    Default

    Hi Mark,

    Thanks for taking the time to look at this. I've posted the issue in Saiku.

    Did you run your tests with or without my work-around? The reason that I ask is to begin to evidence the location of the unexpected behaviour as being within Saiku.

    And at Step 3 (using "Analyzer") you are getting the followng (which I only get at step 5 when using "Saiku")?
    Code:
     MeasuresLevel     | Monday | Tuesday
    -------------------+--------+---------
     Total Sales Value |   2    |   10
     Unique Customers  |   1    |    2



    Cheers,
    Mat.

  4. #4
    Join Date
    Sep 2008
    Posts
    845

    Default

    This is definitely not a saiku issue. We depend solely on mondrian to calculate this.
    Can you swap your mondrian jar in saiku/lib with the one thats in web-inf/lib and try this again?
    Check out the saiku project ( online demo ) - next generation open source analytics

    Quote Originally Posted by bugg_tb
    Join the Unofficial Pentaho IRC channel on freenode.
    Server: chat.freenode.net Channel: ##pentaho

    Please try and make an effort and search the wiki and forums before posting!

  5. #5
    Join Date
    Sep 2008
    Posts
    845

    Default

    Btw, looking at your "fact table" and "dimension" table

    its not possible to have 1 fact row reference 2 dimension rows - this will cause arbitrary results - which you experience from my point of view

    i dont think mondrian can understand this
    Check out the saiku project ( online demo ) - next generation open source analytics

    Quote Originally Posted by bugg_tb
    Join the Unofficial Pentaho IRC channel on freenode.
    Server: chat.freenode.net Channel: ##pentaho

    Please try and make an effort and search the wiki and forums before posting!

  6. #6
    Join Date
    Aug 2013
    Posts
    4

    Default

    Current Mondrian implementations do not have specific handling for these sorts of M:M fact-to-dimension mappings where a data modeler would typically design the model with a bridge table to address multiple dimension rows
    Last edited by mkromer; 12-17-2013 at 01:19 PM.

  7. #7
    Join Date
    Aug 2013
    Posts
    4

    Default

    Those were the results I got with Analyzer. But could have been simply due to arbitrary luck.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •