Hitachi Vantara Pentaho Community Forums
Page 2 of 2 FirstFirst 12
Results 11 to 20 of 20

Thread: [Mondrian] Adding Grouping Set support for Distinct Count measures

  1. #11
    John V. Sichi Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    Julian Hyde wrote:
    > Oops. I just realised we were talking about example #2, which is for
    > regular, non-distinct measures. If the MDX set/list contains duplicate
    > members, then the results are added. So I'm instructing delegates to say
    > 'aye' for Matt & Ajit.


    I won't argue with MSAS, so double-counting for dups it is. I'm
    surprised they don't allow you to put on half a state so you can do
    2.5*OR + CA

    As Ajit points out, this means we have to be careful with pushing down
    constraints for non-distinct aggs to SQL. Rushan and I have been
    thinking about how to build on the distinct-count optimization she put
    in, with the next step being optimization and caching for non-visual
    totals via calculated members.

    JVS
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  2. #12
    John V. Sichi Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    John V. Sichi wrote:
    > As Ajit points out, this means we have to be careful with pushing down
    > constraints for non-distinct aggs to SQL. Rushan and I have been
    > thinking about how to build on the distinct-count optimization she put
    > in, with the next step being optimization and caching for non-visual
    > totals via calculated members.


    Oops, I meant for visual totals. Below is an example MDX query which
    demonstrates an extreme case (computing the visual grand total for some
    combination of large arbitrary subsets of three dimensions).

    With
    Set [*BASE_MEMBERS_Customer] as [Customers].[Name].members
    Set [*BASE_MEMBERS_Store] as [Store].[Store Name].members
    Set [*BASE_MEMBERS_Product] as [Product].[All Products].[Food].children
    Member [Customers].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Customer])
    Member [Product].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Product])
    Member [Store].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Store])
    Select
    {[Measures].[Unit Sales]} on columns,
    ([Store].[*SUBTOTAL],[Customers].[*SUBTOTAL],[Product].[*SUBTOTAL])
    on rows
    From [Sales]
    where ([Time].[1997]);

    With latest Mondrian, on my laptop against Derby, it takes about 180
    seconds. And if I repeat it, the time does not go down, since the total
    cannot be cached.

    However, if I edit line 79 of AggregateFunDef.java to

    if ((aggregator == RolapAggregator.DistinctCount)
    || (aggregator == RolapAggregator.Sum)) {

    then the time goes down to 105s for the first execution, and 40s for
    subsequent executions. The speedup comes from pushing the sum
    computation down to SQL instead of letting Mondrian iterate over the big
    sparse 3D space.

    Derby is quite slow for the join/agg; if I run it on LucidDB, the
    improvement for the first MDX execution (where the SQL execution
    matters) is quite a bit larger, taking it down to 83s. (Subsequent runs
    are 40s as with Derby since the result is already cached, so SQL
    performance doesn't come into play.)

    I was wondering why the cached runs still took 40s, so I probed the
    stack and it was spending most of the time in
    AggregateFunDef.removeOverlappingTupleEntries. I commented this out,
    and the LucidDB time went down to 15s for the first execution, and 8s
    for the cached execution.

    To try this, you also have to set mondrian.rolap.maxConstraints=100000,
    because the IN list generated is huge. (If your DBMS doesn't like large
    IN lists, this won't be an attractive approach.) This implies cache
    bloat issues, since those make up the cache key. And the overall
    approach raises optimization issues, since for many MDX queries, pushing
    down to SQL like this would be overkill.

    JVS

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  3. #13
    John V. Sichi Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    John V. Sichi wrote:
    > I was wondering why the cached runs still took 40s, so I probed the
    > stack and it was spending most of the time in
    > AggregateFunDef.removeOverlappingTupleEntries. I commented this out,
    > and the LucidDB time went down to 15s for the first execution, and 8s
    > for the cached execution.


    In eigenchange 10766, I changed AggregateFunDef to allow it to skip the
    list reduction methods added by Ajit (on a per dialect basis), because
    they are really slow. Currently it's only skipping for LucidDB, since
    it doesn't have a limit on IN list size, so the reductions aren't
    needed; for dialects where the reductions are required, it would be a
    good idea for someone to optimize Ajit's code.

    JVS
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  4. #14
    John V. Sichi Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    John V. Sichi wrote:
    > I was wondering why the cached runs still took 40s, so I probed the
    > stack and it was spending most of the time in
    > AggregateFunDef.removeOverlappingTupleEntries. I commented this out,
    > and the LucidDB time went down to 15s for the first execution, and 8s
    > for the cached execution.


    In eigenchange 10766, I changed AggregateFunDef to allow it to skip the
    list reduction methods added by Ajit (on a per dialect basis), because
    they are really slow. Currently it's only skipping for LucidDB, since
    it doesn't have a limit on IN list size, so the reductions aren't
    needed; for dialects where the reductions are required, it would be a
    good idea for someone to optimize Ajit's code.

    JVS
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  5. #15
    Julian Hyde Guest

    Default [Mondrian] Changing the mondrian development process to preventperformance slippages

    > John Sichi wrote:
    >
    > RE: [Mondrian] Adding Grouping Set support for Distinct Count measures
    >
    > In eigenchange 10766, I changed AggregateFunDef to allow it
    > to skip the
    > list reduction methods added by Ajit (on a per dialect
    > basis), because
    > they are really slow.


    I'm beginning to think that I need to start running a tighter ship as
    regards performance. There have been several alleged performance slippages
    over the past year, but we've not caught them effectively. Our process is
    not strong enough to detect them at the time they are made, and after the
    event it is too difficult to figure out which change out of many caused
    performance to sutffer.

    So, please, I'd like to hear suggestions for how we can change our process.
    It can't be purely a process change, because I don't personally have enough
    time/discipline to review each change as it is made and test its performance
    effects; there has to be some technology involved. Developers are
    responsible for ensuring that their change doesn't degrade performance, even
    on platforms that are not of interest to them personally, but it isn't
    enforced, so slippages occur. So we need a way to enforce that changes don't
    degrade performance, just as we have a regression suite to ensure that other
    aspects of mondrian's behavior are preserved.

    Since LucidEra and Thomson/Thoughtworks are the two largest groups besides
    Pentaho who have an interest in developing mondrian, I would like those two
    groups in particular to step up with suggestions and offers of help. Pentaho
    can provide resources to run the process and publish results, but can only
    offer limited leadership.

    Julian

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  6. #16
    Ajit Vasudeo Joglekar Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    The current logic can certainly be optimized for better performance. Could
    you please share the mdxs that were taking especially longer time? We can
    start looking into this

    Thanks

    -Ajit




    "John V. Sichi" <jsichi (AT) gmail (DOT) com>
    Sent by: mondrian-bounces (AT) pentaho (DOT) org
    03/30/2008 01:42 PM
    Please respond to
    Mondrian developer mailing list <mondrian (AT) pentaho (DOT) org>


    To
    Mondrian developer mailing list <mondrian (AT) pentaho (DOT) org>
    cc

    Subject
    Re: [Mondrian] Adding Grouping Set support for Distinct Count measures






    John V. Sichi wrote:
    > I was wondering why the cached runs still took 40s, so I probed the
    > stack and it was spending most of the time in
    > AggregateFunDef.removeOverlappingTupleEntries. I commented this out,
    > and the LucidDB time went down to 15s for the first execution, and 8s
    > for the cached execution.


    In eigenchange 10766, I changed AggregateFunDef to allow it to skip the
    list reduction methods added by Ajit (on a per dialect basis), because
    they are really slow. Currently it's only skipping for LucidDB, since
    it doesn't have a limit on IN list size, so the reductions aren't
    needed; for dialects where the reductions are required, it would be a
    good idea for someone to optimize Ajit's code.

    JVS
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  7. #17
    John V. Sichi Guest

    Default Re: [Mondrian] Adding Grouping Set support for Distinct Count measures

    Ajit Vasudeo Joglekar wrote:
    >
    > The current logic can certainly be optimized for better performance.
    > Could you please share the mdxs that were taking especially longer time?
    > We can start looking into this


    This is the one I used for comparison (intentionally pathological to
    make it easy to spot the problem while profiling). Customer Count
    provides the necessary DISTINCTCOUNT.

    With
    Set [*BASE_MEMBERS_Customer] as [Customers].[Name].members
    Set [*BASE_MEMBERS_Store] as [Store].[Store Name].members
    Set [*BASE_MEMBERS_Product] as [Product].[All Products].[Food].children
    Member [Customers].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Customer])
    Member [Product].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Product])
    Member [Store].[*SUBTOTAL] As Aggregate([*BASE_MEMBERS_Store])
    Select
    {[Measures].[Customer Count]} on columns,
    ([Store].[*SUBTOTAL],[Customers].[*SUBTOTAL],[Product].[*SUBTOTAL])
    on rows
    From [Sales]
    where ([Time].[1997]);

    JVS
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  8. #18
    Matt Campbell Guest

    Default Re: [Mondrian] Changing the mondrian development process to preventperformance slippages

    At Thomson we have a performance test suite that we run semi-regularly. The
    suite involves a set of Cognos reports designed to be representative of
    typical use of the system. All reports are run in a "clean", 3-tiered
    environment where no other activity is happening. We typically run both
    sequential sets of tests as well as concurrent tests. We then collect
    report run times and compare to the previous run. For some test runs we also
    collect CPU and memory statistics. In the past these test results have
    clued us in to issues with Cognos, with our system configuration, our custom
    jdbc driver, and occasionally Mondrian.

    Our tests have not been run regularly enough to catch Mondrian performance
    problems when they happen, however. We don't integrate every revision of
    Mondrian into our system, so it may not be clear what change actually
    introduced an issue.

    What I would love to see is a nightly test suite that runs a set of queries
    with multiple configurations, collects timings, and then dumps a report to
    somewhere accessible. Even better would be to run it as part of the cruise
    and report back a % difference after each checkin, but that's probably not
    feasible if we want to test a large variety of configurations. Either way,
    if we can get % difference information on a regular basis we can react more
    quickly to new issues.

    Simply defining a set of queries and incorporating them into a separate
    JUnit test suite might be a step in the right direction.






    On Sun, Mar 30, 2008 at 8:00 PM, Julian Hyde <jhyde (AT) pentaho (DOT) org> wrote:

    > > John Sichi wrote:
    > >
    > > RE: [Mondrian] Adding Grouping Set support for Distinct Count measures
    > >
    > > In eigenchange 10766, I changed AggregateFunDef to allow it
    > > to skip the
    > > list reduction methods added by Ajit (on a per dialect
    > > basis), because
    > > they are really slow.

    >
    > I'm beginning to think that I need to start running a tighter ship as
    > regards performance. There have been several alleged performance slippages
    > over the past year, but we've not caught them effectively. Our process is
    > not strong enough to detect them at the time they are made, and after the
    > event it is too difficult to figure out which change out of many caused
    > performance to sutffer.
    >
    > So, please, I'd like to hear suggestions for how we can change our
    > process.
    > It can't be purely a process change, because I don't personally have
    > enough
    > time/discipline to review each change as it is made and test its
    > performance
    > effects; there has to be some technology involved. Developers are
    > responsible for ensuring that their change doesn't degrade performance,
    > even
    > on platforms that are not of interest to them personally, but it isn't
    > enforced, so slippages occur. So we need a way to enforce that changes
    > don't
    > degrade performance, just as we have a regression suite to ensure that
    > other
    > aspects of mondrian's behavior are preserved.
    >
    > Since LucidEra and Thomson/Thoughtworks are the two largest groups besides
    > Pentaho who have an interest in developing mondrian, I would like those
    > two
    > groups in particular to step up with suggestions and offers of help.
    > Pentaho
    > can provide resources to run the process and publish results, but can only
    > offer limited leadership.
    >
    > Julian
    >
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org
    > http://lists.pentaho.org/mailman/listinfo/mondrian
    >


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  9. #19
    John V. Sichi Guest

    Default Re: [Mondrian] Changing the mondrian development process to preventperformance slippages

    LucidEra's setup is fairly similar to what Matt describes. We run a
    full stack (Grinder simulating browser clients sending XML report
    requests; ClearView->Mondrian->LucidDB on the server side, where
    ClearView is LucidEra's AJAX UI). For the concurrency tests, one
    configuration runs with periodic cache flushing to simulate a real-world
    environment where the reports aren't the same every time. We vary
    configurations by number of concurrent simulated users for
    concurrency/throughput testing.

    This has helped us catch many issues with Mondrian (both correctness and
    performance). We branch Mondrian in Perforce for each LucidEra release
    so that we can stabilize and maintain it, and then do perf testing
    against the mainline to know when we can safely sync back up to it for
    the next release. So as with Thomson, we don't track the effect of
    every Mondrian change (although sometimes we have to do comparative runs
    with and without selected changes to identify a culprit).

    For direct Mondrian-level testing, maybe a full client/server setup
    would be overkill. Perhaps the concurrency test suite contributed by
    Khanh Vu could be used as a basis for the multi-user performance
    simulation (with configurability added for disabling correctness testing
    to achieve high throughput, on the assumption that a correctness suite
    runs separately)? It includes some cache-flushing and mem-hungry scenarios.

    JVS

    Matt Campbell wrote:
    >
    > At Thomson we have a performance test suite that we run semi-regularly.
    > The suite involves a set of Cognos reports designed to be representative
    > of typical use of the system. All reports are run in a "clean",
    > 3-tiered environment where no other activity is happening. We typically
    > run both sequential sets of tests as well as concurrent tests. We then
    > collect report run times and compare to the previous run. For some test
    > runs we also collect CPU and memory statistics. In the past these test
    > results have clued us in to issues with Cognos, with our system
    > configuration, our custom jdbc driver, and occasionally Mondrian.
    >
    > Our tests have not been run regularly enough to catch Mondrian
    > performance problems when they happen, however. We don't integrate
    > every revision of Mondrian into our system, so it may not be clear what
    > change actually introduced an issue.
    >
    > What I would love to see is a nightly test suite that runs a set of
    > queries with multiple configurations, collects timings, and then dumps a
    > report to somewhere accessible. Even better would be to run it as part
    > of the cruise and report back a % difference after each checkin, but
    > that's probably not feasible if we want to test a large variety of
    > configurations. Either way, if we can get % difference information on a
    > regular basis we can react more quickly to new issues.
    >
    > Simply defining a set of queries and incorporating them into a separate
    > JUnit test suite might be a step in the right direction.
    >
    >
    >
    >
    >
    >
    > On Sun, Mar 30, 2008 at 8:00 PM, Julian Hyde <jhyde (AT) pentaho (DOT) org
    > <mailto:jhyde (AT) pentaho (DOT) org>> wrote:
    >
    > > John Sichi wrote:
    > >
    > > RE: [Mondrian] Adding Grouping Set support for Distinct Count

    > measures
    > >
    > > In eigenchange 10766, I changed AggregateFunDef to allow it
    > > to skip the
    > > list reduction methods added by Ajit (on a per dialect
    > > basis), because
    > > they are really slow.

    >
    > I'm beginning to think that I need to start running a tighter ship as
    > regards performance. There have been several alleged performance
    > slippages
    > over the past year, but we've not caught them effectively. Our
    > process is
    > not strong enough to detect them at the time they are made, and
    > after the
    > event it is too difficult to figure out which change out of many caused
    > performance to sutffer.
    >
    > So, please, I'd like to hear suggestions for how we can change our
    > process.
    > It can't be purely a process change, because I don't personally have
    > enough
    > time/discipline to review each change as it is made and test its
    > performance
    > effects; there has to be some technology involved. Developers are
    > responsible for ensuring that their change doesn't degrade
    > performance, even
    > on platforms that are not of interest to them personally, but it isn't
    > enforced, so slippages occur. So we need a way to enforce that
    > changes don't
    > degrade performance, just as we have a regression suite to ensure
    > that other
    > aspects of mondrian's behavior are preserved.
    >
    > Since LucidEra and Thomson/Thoughtworks are the two largest groups
    > besides
    > Pentaho who have an interest in developing mondrian, I would like
    > those two
    > groups in particular to step up with suggestions and offers of help.
    > Pentaho
    > can provide resources to run the process and publish results, but
    > can only
    > offer limited leadership.
    >
    > Julian
    >
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org <mailto:Mondrian (AT) pentaho (DOT) org>
    > http://lists.pentaho.org/mailman/listinfo/mondrian
    >
    >
    >
    > ------------------------------------------------------------------------
    >
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org
    > http://lists.pentaho.org/mailman/listinfo/mondrian


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  10. #20
    John V. Sichi Guest

    Default Re: [Mondrian] Changing the mondrian development process to preventperformance slippages

    Julian Hyde wrote:
    > Since LucidEra and Thomson/Thoughtworks are the two largest groups besides
    > Pentaho who have an interest in developing mondrian, I would like those two
    > groups in particular to step up with suggestions and offers of help. Pentaho
    > can provide resources to run the process and publish results, but can only
    > offer limited leadership.


    Below is a little script which will sync a particular version of
    Mondrian, build it, run a query via cmdrunner using a particular
    properties file, and then grep out the execution time. We could start
    with that as a baby step of automated single-user no-cache regression
    detection.

    If Pentaho can set up exec+publishing automation for something like this
    with a few test queries, plus a way for contributors to add new ones,
    LucidEra can submit a lot of coverage queries. Publication could
    include per-query and total-time line graphs with change number as the x
    axis.

    Since a script doesn't depend on any code changes, it could easily be
    used for historical analysis as well (write a higher-level script which
    pulls all old change numbers from Perforce and collects timing from
    them, or at least for compatible queries). That's one of the reasons
    the script below copies from a clean client to a dirty temp workspace;
    that way the pull from Perforce can be incremental for each change
    number, and we don't have to worry about pollution across changes.
    Think binary search in eigenchange space for automatically finding the
    point of introduction of a regression...

    We could start with a single configuration, and then start adding more
    to the mix (as with megatest), as well as building up some performance
    analytics on that (break down by feature, contributor, etc).

    A suggestion from Stephan Zuercher: instrument Mondrian enough so that
    logical counters such as number of expression evaluations and number of
    SQL queries issued can also be included in any reporting/alerting.
    These are a lot less noisy than real-time execution metrics. (This
    wouldn't be compatible with historical analysis earlier than the
    instrumentation's point of introduction, but once it's in place, we can
    use it when we flash back to any point after that.)

    JVS

    #!/bin/bash

    set -e
    set -v

    cleanpath=/apps/jvs/open/mondrian
    workpath=/apps/jvs/open/work
    changeno=11154
    propsfile=/apps/jvs/open/slamit/local.properties
    queryfile=/apps/jvs/open/slamit/query.mdx

    p4 sync //open/mondrian/...@${changeno}

    rm -rf ${workpath}
    cp -R ${cleanpath} ${workpath}
    cd ${workpath}
    ant clean
    ant
    ant jar
    ant cmdrunner
    bin/run.sh -t -p ${propsfile} -f ${queryfile} > query.out 2>&1
    grep "time\[" query.out > time.txt

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.