PDA

View Full Version : Re : [Mondrian] Multi-threading SQL execution



michael bienstein
02-08-2007, 07:31 PM
Yes, there has been thought about it. I'm sure Julian can fill you in.

The main issues I see are:
1) Either you let the database do the parallelism, in which case you need to issue effectively multiple GROUP BYs in one SELECT (which is possible on some databases such as ORACLE and even MySQL with its ROLLUP option) and then disentangle the results,
but this is not a standard in SQL.
2) Or you are really running two parallel queries in two separate connections (almost no JDBC driver will allow parallel queries. The JTS/JTA maps transaction context to thread and JDBC drivers must obey this).
3) Or you have separate databases and have obviously different Connection objects open.

In the options (2) and (3) you need to revisit the way JDBC connections are handled by Mondrian. In option (1) you need to maintain different SQL generation strategies. All nice ideas but a lot of work. Yes I've thought about it. But who can do it? Frustrating.

Michael

----- Message d'origine ----
De : Matt Campbell <mkambol (AT) gmail (DOT) com>

John V. Sichi
02-09-2007, 07:00 AM
michael bienstein wrote:
> The main issues I see are:
> 1) Either you let the database do the parallelism, in which case you
> need to issue effectively multiple GROUP BYs in one SELECT (which is
> possible on some databases such as ORACLE and even MySQL with its ROLLUP
> option) and then disentangle the results,
> but this is not a standard in SQL.

Actually, SQL/OLAP (including CUBE, ROLLUP, and window clauses) is now
standard in SQL:2003, but de facto, yeah, you can't rely on it as
universal yet.

> In the options (2) and (3) you need to revisit the way JDBC connections
> are handled by Mondrian. In option (1) you need to maintain different
> SQL generation strategies. All nice ideas but a lot of work. Yes I've
> thought about it. But who can do it? Frustrating.

Long term, LucidEra will probably work on (1). We're planning to start
on subtotal calculated member pushdown first. Once LucidDB
(http://www.luciddb.org) can handle SQL/OLAP, we'll want Mondrian to
generate it.

JVS
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Matt Campbell
02-09-2007, 12:21 PM
Thanks for the responses, Michael and John.

We had been thinking about both 1 and 2. Your right that maintaining
different SQL generation strategies could be tricky, but the enhancement
could have a huge impact. Our DBA says 3-5 times performance boost when
using CUBE() over separate queries.

We may be willing to take on the work, but we'll need help identifying an
approach.

Michael, regarding the optimization you mentioned in the next release-- that
would not help with distinct count measures, is that correct? What I've
been told is that at least some databases (Oracle, DB2) will correctly
compute count distincts when using CUBE and ROLLUP.

I also had a question about option (2): You mentioned that running two
parallel queries won't work because trasaction context is mapped to the
thread. If we're spawning separate threads for each query execution,
wouldn't that get past the limitation?



On 2/8/07, michael bienstein <mbienstein (AT) yahoo (DOT) fr> wrote:[color=blue]
>
> Yes, there has been thought about it. I'm sure Julian can fill you in.
>
> The main issues I see are:
> 1) Either you let the database do the parallelism, in which case you need
> to issue effectively multiple GROUP BYs in one SELECT (which is possible on
> some databases such as ORACLE and even MySQL with its ROLLUP option) and
> then disentangle the results,
> but this is not a standard in SQL.
> 2) Or you are really running two parallel queries in two separate
> connections (almost no JDBC driver will allow parallel queries. The JTS/JTA
> maps transaction context to thread and JDBC drivers must obey this).
> 3) Or you have separate databases and have obviously different Connection
> objects open.
>
> In the options (2) and (3) you need to revisit the way JDBC connections
> are handled by Mondrian. In option (1) you need to maintain different SQL
> generation strategies. All nice ideas but a lot of work. Yes I've thought
> about it. But who can do it? Frustrating.
>
> Michael
>
> ----- Message d'origine ----
> De : Matt Campbell <mkambol (AT) gmail (DOT) com>
>

Julian Hyde
02-12-2007, 09:40 PM
Oops, just noticed that Michael had already enumerated my option #3.

There's no such thing as a simple feature -- for instance, I've been working
on another 'simple feature' (cache flush) for 5 months now. :)

The simplest implementation of in-cache rollup would look what is already in
cache, but to meet Matt's objectives we would need to go further: predict
what is going to be in cache after the requests are complete, and defer
requests which can be subsumed by other requests.

Julian


_____

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On
Behalf Of michael bienstein
Sent: Friday, February 09, 2007 3:19 AM
To: Mondrian developer mailing list
Subject: Re : Re : [Mondrian] Multi-threading SQL execution


There's actually a far simpler optimization to be done in the next release I
hope that can speed things up a lot in this regard that doesn't involve
going to the database at all: rollups from cached cells. Currently if a
Cell is requested and it is not in cache, it will be fetched from the
database even if all the cells that are needed to calculate its value *are*
in the cell cache.

Michael


_____

D