Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: [Mondrian] Non empty optimizer performance regression

  1. #1
    Pappyn Bart Guest

    Default [Mondrian] Non empty optimizer performance regression

    Hi,

    Since the new non empty optimizer code, a few queries run very slow in
    my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).

    I tried to simulate the problem with the foodmart cube, but I am not
    successful (the foodmart schema is too simple, I guess).

    Even a simple crossjoin with two all-level sets takes a very long time,
    so the size of dimensions does not matter. I have a very large number of
    calculated members and many calculated members need calculations based
    on many other calculated members (deeply nested). The same query runs
    very fast (< 1 sec) without NON empty.

    Mondrian seems to get stuck forever in the first
    CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long
    time.


    --> Can someone please explain to me or hint me why this function could
    be slow ? Is it due to the fact that I have large number of calculated
    members? I looked at the code, ran it through the debugger, but I don't
    understand what the function is exactly doing and why this takes such a
    long time.

    If I could understand the pitfalls of this function a bit better, I
    might be able to pinpoint (and solve) the problem.


    Where is the query that takes a long time, translated to foodmart data
    (just an example, since it runs fast on the foodmart schema) :

    WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
    Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
    Sales], null)'
    MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
    [Measures].[Profit] / [Measures].[Store Sales], null)'
    MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
    [Measures].[Units Shipped] / [Measures].[Units Ordered], null)'

    MEMBER [Measures].[Overall eff] as
    'iif(IsEmpty([Measures].[Availability]) or
    IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
    null, cast([Measures].[Availability] as NUMERIC) *
    cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
    NUMERIC))'

    MEMBER [Measures].[Availability Loss tmp] as
    'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
    MEMBER [Measures].[Performance Loss tmp] as
    'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
    MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
    1.0 - [Measures].[Quality], 0)'
    MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
    + [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
    MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
    eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'

    MEMBER [Measures].[Availability loss] as
    'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
    tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
    loss tmp] / [Measures].[Sum loss tmp]), 0))'

    SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
    NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
    Products]} ) ON ROWS
    FROM [Warehouse and Sales]

    Thanks,
    Bart


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  2. #2
    Richard Emberson Guest

    Default Re: [Mondrian] Non empty optimizer performance regression

    Look at the comment above the CrossJoinFunDef.nonEmptyList method
    as a starter.
    Assuming one wants to optimize the crossjoin for "NON EMPTY" by
    excluding Members that are null, one must look at the corner
    cases (which do not showup in Foodmart) prior to deciding to
    drop a Member from the crossjoin.
    Consider an MDX query that depend upon a calculated measure that returns
    a non-null value only if the measures it depends upon are all null -
    pathological, but possible. Or (a case that appeared in our local
    application code) there is a calculated measure that returns null
    if the Customer dimension in a query is at the ALL level and possibly
    non-null (depending upon the dataset) when the Customer dimension was
    at a non-ALL level - this calculated measure directly breaks the
    assumption that if there is no data at the top level for a
    given dimension, then there is no data for any lower level in that
    dimension - again, pathological but possibly.
    In a sense, the current "optimization" code attempts to account for
    the corner cases ... making it rather slower.
    So, does one want slow and correct, or fast and possibly wrong?

    I'd love for someone to figure out how to make it fast and correct.

    Using the CSV testcase loading capabilities in Mondrian, one can
    create new tables, dimensions and cubes that allow one to create
    and test pathological cases not found in Foodmart. See the
    test code that use *.csv files.


    Richard



    Pappyn Bart wrote:
    > Hi,
    >
    > Since the new non empty optimizer code, a few queries run very slow in
    > my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).
    >
    > I tried to simulate the problem with the foodmart cube, but I am not
    > successful (the foodmart schema is too simple, I guess).
    >
    > Even a simple crossjoin with two all-level sets takes a very long time,
    > so the size of dimensions does not matter. I have a very large number of
    > calculated members and many calculated members need calculations based
    > on many other calculated members (deeply nested). The same query runs
    > very fast (< 1 sec) without NON empty.
    >
    > Mondrian seems to get stuck forever in the first
    > CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long time.
    >
    >
    > --> Can someone please explain to me or hint me why this function could
    > be slow ? Is it due to the fact that I have large number of calculated
    > members? I looked at the code, ran it through the debugger, but I don't
    > understand what the function is exactly doing and why this takes such a
    > long time.
    >
    > If I could understand the pitfalls of this function a bit better, I
    > might be able to pinpoint (and solve) the problem.
    >
    >
    > Where is the query that takes a long time, translated to foodmart data
    > (just an example, since it runs fast on the foodmart schema) :
    >
    > WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
    > Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
    > Sales], null)'
    > MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
    > [Measures].[Profit] / [Measures].[Store Sales], null)'
    > MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
    > [Measures].[Units Shipped] / [Measures].[Units Ordered], null)'
    >
    > MEMBER [Measures].[Overall eff] as
    > 'iif(IsEmpty([Measures].[Availability]) or
    > IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
    > null, cast([Measures].[Availability] as NUMERIC) *
    > cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
    > NUMERIC))'
    >
    > MEMBER [Measures].[Availability Loss tmp] as
    > 'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
    > MEMBER [Measures].[Performance Loss tmp] as
    > 'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
    > MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
    > 1.0 - [Measures].[Quality], 0)'
    > MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
    > + [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
    > MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
    > eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'
    >
    > MEMBER [Measures].[Availability loss] as
    > 'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
    > tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
    > loss tmp] / [Measures].[Sum loss tmp]), 0))'
    >
    > SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
    > NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
    > Products]} ) ON ROWS
    > FROM [Warehouse and Sales]
    >
    > Thanks,
    > Bart
    >
    >
    >
    > ------------------------------------------------------------------------
    >
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org
    > http://lists.pentaho.org/mailman/listinfo/mondrian



    --
    Quis custodiet ipsos custodes:
    This email message is for the sole use of the intended recipient(s) and
    may contain confidential information. Any unauthorized review, use,
    disclosure or distribution is prohibited. If you are not the intended
    recipient, please contact the sender by reply email and destroy all
    copies of the original message.
    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  3. #3
    Julian Hyde Guest

    Default RE: [Mondrian] Non empty optimizer performance regression

    Bart,

    Try turning on SQL tracing. See if one particular statement is taking a
    lot of time, or if one statement is being executed repeatedly.

    I once saw a problem with non-empty where a sub-expression was [A
    member].Children and this sub-expression was being executed repeatedly
    to find the non-empty children of [A member] in each context, and each
    execution required the execution of a SQL statement. It would have been
    much cheaper to execute the query once, return slightly too many
    members, and cache the result, rather than executing SQL multiple times.
    I don't know whether your problem is related to that.

    Julian


    _____

    From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
    On Behalf Of Pappyn Bart
    Sent: Friday, May 04, 2007 2:33 AM
    To: Mondrian developer mailing list
    Subject: [Mondrian] Non empty optimizer performance regression


    Hi,

    Since the new non empty optimizer code, a few queries run very slow in
    my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).

    I tried to simulate the problem with the foodmart cube, but I am not
    successful (the foodmart schema is too simple, I guess).

    Even a simple crossjoin with two all-level sets takes a very long time,
    so the size of dimensions does not matter. I have a very large number of
    calculated members and many calculated members need calculations based
    on many other calculated members (deeply nested). The same query runs
    very fast (< 1 sec) without NON empty.

    Mondrian seems to get stuck forever in the first
    CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long
    time.


    --> Can someone please explain to me or hint me why this function could
    be slow ? Is it due to the fact that I have large number of calculated
    members? I looked at the code, ran it through the debugger, but I don't
    understand what the function is exactly doing and why this takes such a
    long time.

    If I could understand the pitfalls of this function a bit better, I
    might be able to pinpoint (and solve) the problem.


    Where is the query that takes a long time, translated to foodmart data
    (just an example, since it runs fast on the foodmart schema) :

    WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
    Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
    Sales], null)'
    MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
    [Measures].[Profit] / [Measures].[Store Sales], null)'
    MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
    [Measures].[Units Shipped] / [Measures].[Units Ordered], null)'

    MEMBER [Measures].[Overall eff] as
    'iif(IsEmpty([Measures].[Availability]) or
    IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
    null, cast([Measures].[Availability] as NUMERIC) *
    cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
    NUMERIC))'

    MEMBER [Measures].[Availability Loss tmp] as
    'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
    MEMBER [Measures].[Performance Loss tmp] as
    'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
    MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
    1.0 - [Measures].[Quality], 0)'
    MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
    + [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
    MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
    eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'

    MEMBER [Measures].[Availability loss] as
    'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
    tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
    loss tmp] / [Measures].[Sum loss tmp]), 0))'

    SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
    NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
    Products]} ) ON ROWS
    FROM [Warehouse and Sales]

    Thanks,
    Bart



    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.