US and Worldwide: +1 (866) 660-7555
Results 1 to 8 of 8

Thread: Count distincts and agg tables

  1. #1
    Join Date
    Apr 2007
    Posts
    1,989

    Default Count distincts and agg tables

    Hi,

    I read somewhere that mondrian is very aware that count distinct's can't be aggregated if you're using agg tables.

    So; It seems to me therefore that this means if it can't find an exactly match agg table for the currently selected dimensions and levels then it must go to the fact table, and not sum up from a different level agg table. That has to be the case..

    So; How come mondrian still generates a sum around the measure when pulling it from the agg table? See here:

    Code:
    2013-10-02 12:39:55,153 DEBUG [mondrian.sql] 20: Segment.load: executing sql [select "dim_date_1"."date_string" as "c0", "dim_estate"."estate_name" as "c1", "dim_customer"."customer_name" as "c2", "dim_sessions_type"."sessions_type_name" as "c3", sum("fact_radius_distinct_d_ec_s"."users") as "m0" from "dim_date" as "dim_date_1", "fact_radius_distinct_d_ec_s" as "fact_radius_distinct_d_ec_s", "dim_estate" as "dim_estate", "dim_customer" as "dim_customer", "dim_sessions_type" as "dim_sessions_type" where "fact_radius_distinct_d_ec_s"."end_date_id" = "dim_date_1"."date_id" and "dim_date_1"."date_string" = '2013-09-24' and "fact_radius_distinct_d_ec_s"."estate_id" = "dim_estate"."estate_id" and "dim_estate"."estate_name" = 'O2WIFI' and "fact_radius_distinct_d_ec_s"."customer_id" = "dim_customer"."customer_id" and "dim_customer"."customer_name" = 'Costa Limited' and "fact_radius_distinct_d_ec_s"."session_type_id" = "dim_sessions_type"."sessions_type_id" and "dim_sessions_type"."sessions_type_name" = 'NORMAL' group by "dim_date_1"."date_string", "dim_estate"."estate_name", "dim_customer"."customer_name", "dim_sessions_type"."sessions_type_name"]
    Any ideas?

    In this case it's fine

    However I just removed one of the dimensions from the query, which should mean the aggregate table isnt used. But it continues to use it. This is surely wrong?

    Dan

  2. #2
    Join Date
    Feb 2007
    Posts
    6

    Default

    Quote Originally Posted by codek View Post

    However I just removed one of the dimensions from the query, which should mean the aggregate table isnt used. But it continues to use it. This is surely wrong?
    What version of mondrian do you use?
    Do you define your aggregates explicitly? There was a bug with distinct-count measure in mondrian < 3.6 if aggregates was defined explicitly.
    http://jira.pentaho.com/browse/MONDRIAN-1370
    http://lists.pentaho.org/pipermail/m...ry/004290.html

  3. #3
    Join Date
    Apr 2007
    Posts
    1,989

    Default

    ahhhh ha. Many thanks, that is _exactly_ it. I'll download a 3.6jar and check.

  4. #4
    Join Date
    Apr 2007
    Posts
    1,989

    Default

    oh; according to jira that fix has not been backported to 3.x mondrian only fixed in 4.0!

  5. #5
    Join Date
    Feb 2007
    Posts
    6

    Default

    Quote Originally Posted by codek View Post
    oh; according to jira that fix has not been backported to 3.x mondrian only fixed in 4.0!
    I'm sure it is commited in 3.6 https://github.com/pentaho/mondrian/...d33a308bc1c5bf

  6. #6
    Join Date
    Apr 2007
    Posts
    1,989

    Default

    hmm ok which build on ci is 3.6 then? Or isnt there one? In which case where can i get the 3.6 jar?

  7. #7
    Join Date
    Feb 2007
    Posts
    6

    Default

    Quote Originally Posted by codek View Post
    hmm ok which build on ci is 3.6 then? Or isnt there one? In which case where can i get the 3.6 jar?
    Officially mondrian 3.6 is only in EE. Unofficially you can build 3.6 from source (tag 3.6.0.1-R).
    But if you download master build from ci http://ci.pentaho.com/view/Analysis/job/mondrian/ it should be for now very similar to 3.6.

  8. #8
    Join Date
    Apr 2007
    Posts
    1,989

    Default

    we are EE.


    I tried the master build, but it has no jars in the lib folder. So it seems broken. Also the download is 25mb where the 3.5 one is more like 60mb. However i found a mondrian.jar in the workbench archive instead. And that does seem to work. ( My checkout from GIT is still running, the connection here isnt very good!)

    So in summary mondrian goes to the agg table when at the right level, and when not at the right level it goes to the fact table, albeit with the messy sql below:

    select count("m0") as "c0" from (select distinct "fact_radius_sessions"."user_id" as "m0" from "fact_radius_sessions" as "fact_radius_sessions") as "dummyname"
    I Do think mondrian should add some further checking or warnings that if it ever sums up a distinct count measure it should be obvious something is wrong.
    Last edited by codek; 10-02-2013 at 09:42 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •