PDA

View Full Version : [Mondrian] Non empty optimizer performance regression



Pappyn Bart
05-04-2007, 05:40 AM
Hi,

Since the new non empty optimizer code, a few queries run very slow in
my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).

I tried to simulate the problem with the foodmart cube, but I am not
successful (the foodmart schema is too simple, I guess).

Even a simple crossjoin with two all-level sets takes a very long time,
so the size of dimensions does not matter. I have a very large number of
calculated members and many calculated members need calculations based
on many other calculated members (deeply nested). The same query runs
very fast (< 1 sec) without NON empty.

Mondrian seems to get stuck forever in the first
CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long
time.


--> Can someone please explain to me or hint me why this function could
be slow ? Is it due to the fact that I have large number of calculated
members? I looked at the code, ran it through the debugger, but I don't
understand what the function is exactly doing and why this takes such a
long time.

If I could understand the pitfalls of this function a bit better, I
might be able to pinpoint (and solve) the problem.


Where is the query that takes a long time, translated to foodmart data
(just an example, since it runs fast on the foodmart schema) :

WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
Sales], null)'
MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
[Measures].[Profit] / [Measures].[Store Sales], null)'
MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
[Measures].[Units Shipped] / [Measures].[Units Ordered], null)'

MEMBER [Measures].[Overall eff] as
'iif(IsEmpty([Measures].[Availability]) or
IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
null, cast([Measures].[Availability] as NUMERIC) *
cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
NUMERIC))'

MEMBER [Measures].[Availability Loss tmp] as
'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
MEMBER [Measures].[Performance Loss tmp] as
'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
1.0 - [Measures].[Quality], 0)'
MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
+ [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'

MEMBER [Measures].[Availability loss] as
'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
loss tmp] / [Measures].[Sum loss tmp]), 0))'

SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
Products]} ) ON ROWS
FROM [Warehouse and Sales]

Thanks,
Bart


_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Richard Emberson
05-04-2007, 09:40 AM
Look at the comment above the CrossJoinFunDef.nonEmptyList method
as a starter.
Assuming one wants to optimize the crossjoin for "NON EMPTY" by
excluding Members that are null, one must look at the corner
cases (which do not showup in Foodmart) prior to deciding to
drop a Member from the crossjoin.
Consider an MDX query that depend upon a calculated measure that returns
a non-null value only if the measures it depends upon are all null -
pathological, but possible. Or (a case that appeared in our local
application code) there is a calculated measure that returns null
if the Customer dimension in a query is at the ALL level and possibly
non-null (depending upon the dataset) when the Customer dimension was
at a non-ALL level - this calculated measure directly breaks the
assumption that if there is no data at the top level for a
given dimension, then there is no data for any lower level in that
dimension - again, pathological but possibly.
In a sense, the current "optimization" code attempts to account for
the corner cases ... making it rather slower.
So, does one want slow and correct, or fast and possibly wrong?

I'd love for someone to figure out how to make it fast and correct.

Using the CSV testcase loading capabilities in Mondrian, one can
create new tables, dimensions and cubes that allow one to create
and test pathological cases not found in Foodmart. See the
test code that use *.csv files.


Richard



Pappyn Bart wrote:
> Hi,
>
> Since the new non empty optimizer code, a few queries run very slow in
> my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).
>
> I tried to simulate the problem with the foodmart cube, but I am not
> successful (the foodmart schema is too simple, I guess).
>
> Even a simple crossjoin with two all-level sets takes a very long time,
> so the size of dimensions does not matter. I have a very large number of
> calculated members and many calculated members need calculations based
> on many other calculated members (deeply nested). The same query runs
> very fast (< 1 sec) without NON empty.
>
> Mondrian seems to get stuck forever in the first
> CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long time.
>
>
> --> Can someone please explain to me or hint me why this function could
> be slow ? Is it due to the fact that I have large number of calculated
> members? I looked at the code, ran it through the debugger, but I don't
> understand what the function is exactly doing and why this takes such a
> long time.
>
> If I could understand the pitfalls of this function a bit better, I
> might be able to pinpoint (and solve) the problem.
>
>
> Where is the query that takes a long time, translated to foodmart data
> (just an example, since it runs fast on the foodmart schema) :
>
> WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
> Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
> Sales], null)'
> MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
> [Measures].[Profit] / [Measures].[Store Sales], null)'
> MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
> [Measures].[Units Shipped] / [Measures].[Units Ordered], null)'
>
> MEMBER [Measures].[Overall eff] as
> 'iif(IsEmpty([Measures].[Availability]) or
> IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
> null, cast([Measures].[Availability] as NUMERIC) *
> cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
> NUMERIC))'
>
> MEMBER [Measures].[Availability Loss tmp] as
> 'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
> MEMBER [Measures].[Performance Loss tmp] as
> 'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
> MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
> 1.0 - [Measures].[Quality], 0)'
> MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
> + [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
> MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
> eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'
>
> MEMBER [Measures].[Availability loss] as
> 'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
> tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
> loss tmp] / [Measures].[Sum loss tmp]), 0))'
>
> SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
> NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
> Products]} ) ON ROWS
> FROM [Warehouse and Sales]
>
> Thanks,
> Bart
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Mondrian mailing list
> Mondrian (AT) pentaho (DOT) org
> http://lists.pentaho.org/mailman/listinfo/mondrian


--
Quis custodiet ipsos custodes:
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Julian Hyde
05-07-2007, 08:13 PM
Bart,

Try turning on SQL tracing. See if one particular statement is taking a
lot of time, or if one statement is being executed repeatedly.

I once saw a problem with non-empty where a sub-expression was [A
member].Children and this sub-expression was being executed repeatedly
to find the non-empty children of [A member] in each context, and each
execution required the execution of a SQL statement. It would have been
much cheaper to execute the query once, return slightly too many
members, and cache the result, rather than executing SQL multiple times.
I don't know whether your problem is related to that.

Julian


_____

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
On Behalf Of Pappyn Bart
Sent: Friday, May 04, 2007 2:33 AM
To: Mondrian developer mailing list
Subject: [Mondrian] Non empty optimizer performance regression


Hi,

Since the new non empty optimizer code, a few queries run very slow in
my application (for 1 sec (before mondrian 2.3.2) -> 5 minutes).

I tried to simulate the problem with the foodmart cube, but I am not
successful (the foodmart schema is too simple, I guess).

Even a simple crossjoin with two all-level sets takes a very long time,
so the size of dimensions does not matter. I have a very large number of
calculated members and many calculated members need calculations based
on many other calculated members (deeply nested). The same query runs
very fast (< 1 sec) without NON empty.

Mondrian seems to get stuck forever in the first
CrossJoinFunDef.nonEmptyListNEW(), it only leaves after a very long
time.


--> Can someone please explain to me or hint me why this function could
be slow ? Is it due to the fact that I have large number of calculated
members? I looked at the code, ran it through the debugger, but I don't
understand what the function is exactly doing and why this takes such a
long time.

If I could understand the pitfalls of this function a bit better, I
might be able to pinpoint (and solve) the problem.


Where is the query that takes a long time, translated to foodmart data
(just an example, since it runs fast on the foodmart schema) :

WITH MEMBER [Measures].[Availability] as 'iif([Measures].[Warehouse
Sales] > 0, [Measures].[Warehouse Profit] / [Measures].[Warehouse
Sales], null)'
MEMBER [Measures].[Performance] as 'iif([Measures].[Store Sales] > 0,
[Measures].[Profit] / [Measures].[Store Sales], null)'
MEMBER [Measures].[Quality] as 'iif([Measures].[Units Ordered] > 0,
[Measures].[Units Shipped] / [Measures].[Units Ordered], null)'

MEMBER [Measures].[Overall eff] as
'iif(IsEmpty([Measures].[Availability]) or
IsEmpty([Measures].[Performance]) or IsEmpty([Measures].[Quality]),
null, cast([Measures].[Availability] as NUMERIC) *
cast([Measures].[Performance] as NUMERIC) * cast([Measures].[Quality] as
NUMERIC))'

MEMBER [Measures].[Availability Loss tmp] as
'iif([Measures].[Availability] > 0, 1.0 - [Measures].[Availability], 0)'
MEMBER [Measures].[Performance Loss tmp] as
'iif([Measures].[Performance] > 0, 1.0 - [Measures].[Performance], 0)'
MEMBER [Measures].[Quality Loss tmp] as 'iif([Measures].[Quality] > 0,
1.0 - [Measures].[Quality], 0)'
MEMBER [Measures].[Sum loss tmp] as '[Measures].[Availability loss tmp]
+ [Measures].[Performance loss tmp] + [Measures].[Quality loss tmp]'
MEMBER [Measures].[Overall eff loss tmp] as 'iif([Measures].[Overall
eff] < 1.0, 1.0 - [Measures].[Overall eff], 0.0)'

MEMBER [Measures].[Availability loss] as
'iif(IsEmpty([Measures].[Overall eff]), null, iif([Measures].[Sum loss
tmp] > 0, [Measures].[Overall eff loss tmp] * ([Measures].[Availability
loss tmp] / [Measures].[Sum loss tmp]), 0))'

SELECT NON EMPTY {[Measures].[Availability loss]} ON COLUMNS,
NON EMPTY CrossJoin( {[Promotion Media].[All Media]}, {[Product].[All
Products]} ) ON ROWS
FROM [Warehouse and Sales]

Thanks,
Bart



_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian