Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Distinct count performance improvement techiniques?

  1. #1
    Join Date
    Jan 2007
    Posts
    10

    Default Distinct count performance improvement techiniques?

    Hi All,

    Currently we have cube structure which have good number of measures with distinct count aggregate function.This really slows down the performance of analytics report execution.

    *Apart from tunning the db, is there any way to improve the performance of distinct count queries?

    *Will there be any performance gain by using virtual cube concept (having distinct count measures in one cube and rest of measures in another cube)?

    *According to docs aggregate tables can work with distinct count only if they are defined at logical level.As such, we cannot define aggregates for all combination of dimension levels and cannot rely on few aggregate tables in addition to fact table as the aggregate function has to differ(either distinct count for fact or max for aggregate tables)?

    *Can we have view for aggregate tables?

    *I read that the future release will make use of grouping sets which will improve execution behaviour of mondrian engine...are there any plans to run queries in parallel also?


    Thanks
    AJ

  2. #2
    DEinspanjer Guest

    Default

    These are all important questions to me since many of the important measures that I need to analyze are distinct counts. Any answers available?

  3. #3
    Join Date
    Nov 1999
    Posts
    1,618

    Default Distinct count performance improvement techiniques?

    Quote Originally Posted by AJ_user View Post
    Apart from tunning the db, is there any way to improve the performance of distinct count queries?
    Aggregate tables are the main way.

    Quote Originally Posted by AJ_user View Post
    Will there be any performance gain by using virtual cube concept (having distinct count measures in one cube and rest of measures in another cube)?
    Separating the regular measures from the distinct-count measures might help in cases where there is no aggregate table of precisely the dimensionality required. Mondrian should be able to compute the the regular measures by rolling up the nearest aggregate table, while it goes to the fact table for the distinct-count measures. I haven't tried this; let me know what you find.

    Quote Originally Posted by AJ_user View Post
    According to docs aggregate tables can work with distinct count only if they are defined at logical level. As such, we cannot define aggregates for all combination of dimension levels and cannot rely on few aggregate tables in addition to fact table as the aggregate function has to differ (either distinct count for fact or max for aggregate tables)?
    The docs are correct. There is one case where mondrian can roll up distinct-count measures, that's where where the dimension(s) being rolled up is dependent on the foreign key which is the target of the distinct-count measure. For example, the gender column is dependent on customer_id, so mondrian can roll up distinct customers on gender. This makes sense: if you know you have 100 distinct male customers and 90 distinct female customers, you know you have 190 distinct customers.

    Quote Originally Posted by AJ_user View Post
    Can we have view for aggregate tables?
    Yes. (But if the view is complicated and takes a long time to evaluate, you won't see much performance gain.)

    Quote Originally Posted by AJ_user View Post
    I read that the future release will make use of grouping sets which will improve execution behaviour of mondrian engine... are there any plans to run queries in parallel also?
    We discussed executing queries in parallel. We chose to do the GROUPING SETS feature instead, and this was released in 2.4. Executing queries in parallel would still be useful, especially for databases which don't support GROUPING SETS, but we have no plans to support parallel execution.

    Julian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.