US and Worldwide: +1 (866) 660-7555
+ Reply to Thread
Results 1 to 10 of 10

Thread: Sum distinct question

  1. #1
    Join Date
    Oct 2006
    Posts
    14

    Default Sum distinct question

    This question is about how to define an OLAP schema to sum values with duplications.

    Here is an example of the data:

    Contract|Product Group|Product|Profit
    5000|Stream|VOD|1000
    5000|Stream|TV|1000
    5001|Stream|VOD|2000
    5001|DVD|48h rental|2000
    5002|Stream|TV|1000

    The data represents 3 contracts with total profit of $4000 (contracts and profit are duplicated in case there is more than one product under the contract.

    The requirement is to sum the profit for distinct contracts, so that the totals are as follows:
    Product|Profit
    All|4000
    Stream|4000
    VOD|3000
    TV|2000
    DVD|2000


    Can this be implemented?

    Please note that the regular sum( distinct ) aggregator is summing the distinct profit values and not the distinct contracts so it will give a wrong result (3000$ for All) because it distincts on the profit value rather than the contract id.

    I think this can be solved by normalizing the profit to the count of distinct contracts and a calculated measure, but I am not sure that this will work for all levels.

    Thanks in advance...
    Last edited by udii; 07-14-2009 at 05:39 AM.

  2. #2
    Join Date
    Oct 2008
    Posts
    3

    Default

    If I understand the desired results correctly, the corresponding SQL would be the following:

    Code:
     
    select 'ALL' as agg_name, sum(profit) as profit
    from
    (select contract, sum(profit) as profit
     from
     (select contract, max(profit) as profit
      from products
      group by contract
     )
     group by contract
    );
     
    select contract, sum(profit) as profit
    from
    (select contract, max(profit) as profit
     from products
     group by contract
    )
    group by contract;
     
    select product_group, sum(profit) as profit
    from
    (select product_group, max(profit) as profit
     from products
     group by contract, product_group
    )
    group by product_group;
     
    select product, sum(profit) as profit
    from
    (select product, max(profit) as profit
     from products
     group by contract, product_group, product
    )
    group by product;
    giving the following results:
    Code:
     
    ALL     PROFIT
    ALL     4000
     
    CONTRACT    PROFIT
    5000            1000
    5001            2000
    5002            1000
     
    PRODUCT_GROUP   PROFIT
    DVD                     2000
    Stream                 4000
     
    PRODUCT         PROFIT
    48th rental       2000
    TV                  2000
    VOD                3000
    Assuming that's correct, it exposes that these measures are non-additive on a single hiearchy. So my initial thought would be to investigate separate hierarchies within the same PRODUCT dimension. It may be that a calculated measure can then define what you're looking for. Another idea may be to define separate fact tables (separate cubes) and create a virtual cube combining the summarized results.

    dlgrasse

  3. #3
    Join Date
    Oct 2006
    Posts
    14

    Default SOLUTION 1 : use an aggtable for each level

    Thank you for the reply, dlgrasse.

    You are suggesting to construct each level of the hierarchy separately and to later combine them into the same dimension or v-cube.

    Regarding the first part, you have shown the SQLs that can be used to produce the distinct-counts that are expected.

    If I understand correctly those SQLs should be defined as aggregation tables for the cube.

    I have only one question though, in my case there are actually 3 additional dimensions just like the product dimension, so the agg-table must contain all the possible combinations of those 4 dimensions which might result in a huge number of rows which may not be feasible.

    Still, it gives some kind of a solutions...

    thanks.

  4. #4
    Join Date
    Oct 2006
    Posts
    14

    Default agg tables

    Goodman has a nice post on how to use aggtable to implement a measure
    http://www.nicholasgoodman.com/bt/bl...mondrian-dumb/

    since I have multiple dimensions, some are also quite wide, this which result in a huge aggtable.

    such an aggtable can be defined as a view and the database might be able to provide the right numbers as Goodman suggests.

  5. #5
    Join Date
    Oct 2006
    Posts
    14

    Default SOLUTION 2 : use a custom aggregator

    It seems that in Mondrian one can develop and use custom aggrergators by implementing this interface:
    http://mondrian.pentaho.org/api/mond...ggregator.html

    have anyone done that before? can it work for my sum-distinct?
    Last edited by udii; 07-30-2009 at 06:48 AM.

  6. #6
    Join Date
    Oct 2007
    Posts
    254

    Default asdf

    Please forgive me if I'm missing something very obvious, but ... why not add a contract dimension? It may not have pretty member names like "Stream", and "VOD", but it would allow you to use a simple sum for the measure you're talking about.

    -Brian

  7. #7
    Join Date
    Oct 2006
    Posts
    14

    Cool use Contract dimension

    Quote Originally Posted by Phantal View Post
    Please forgive me if I'm missing something very obvious, but ... why not add a contract dimension? It may not have pretty member names like "Stream", and "VOD", but it would allow you to use a simple sum for the measure you're talking about.
    Brian, I see the incentive to put in a contract dimension, since this is the value I want to distinct on.

    Still I do not see how it can be used to drive the sums per product-groups I am looking for, so maybe you can further explain how adding the contract dimension can help.
    Last edited by udii; 07-31-2009 at 08:18 AM.

  8. #8
    Join Date
    Oct 2007
    Posts
    254

    Default

    Quote Originally Posted by udii View Post
    Brian, I see the incentive to put in a contract dimension, since this is the value I want to distinct on.

    Still I do not see how it can be used to drive the sums per product-groups I am looking for, so maybe you can further explain how adding the contract dimension can help.
    You'd also have a Product Group dimension. A typical dynamic OLAP situation would start out with each dimension's All member on rows, and your measures on columns. You'd then drill down to the first level of the Product Group dimension, and the first level of the Contract dimension. If your measure is just a simple sum, unless I'm missing something obvious you'll get exactly what you want.

    -Brian

  9. #9
    Join Date
    Oct 2006
    Posts
    14

    Thumbs up

    Phantal, I think that a simple sum will not work. for example, the total for [ProductGropu.Stream],[Contract.5000] will be 2000 since the contract appear twice and not 1000 as it should be, but,

    it seems that by having the Contract dimension we can average sum by the record count for each Contract which will result in what we need I hope...

    see http://forums.pentaho.org/showthread.php?t=67918

  10. #10
    Join Date
    Oct 2006
    Posts
    14

    Talking Resolved

    The answer to the question is very simple with the wonders of MDX,
    and the help of the community,
    here is what it takes:

    Code:
                    <Measure name="Profit" column="Profit" aggregator="sum" visible="true">
                    </Measure>
    		<Measure name="ContractCount" column="Contract" aggregator="count" visible="true">
                    </Measure>
    		<CalculatedMember name="AvgProfit" dimension="Measures">
                            <Formula>
                             [Measures].[Profit] / [Measures].[ContractCount]
                            </Formula>
                    </CalculatedMember>
                    <CalculatedMember name="DedupedProfit" dimension="Measures">
                            <Formula>
    			Sum([testContract].CurrentMember.Children, [Measures].[AvgProfit])
                            </Formula>
                    </CalculatedMember>
    For the complete source code visit http://www.tikalk.com/java/blog/sum-...ensional-model

    Thanks

+ Reply to Thread

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts