PDA

View Full Version : Re : Re : [Mondrian] Multi-threading SQL execution



michael bienstein
02-09-2007, 01:10 PM
Matt,
I'm in the forums, but have no say in what goes into the next release. I am only suggesting it because I think the work is not huge and it is a performance improvement that doesn't break the existing architecture. Talk to Julian about it.

As for distinct count, there are two types of rollups that get done. You can actually roll up some distinct counts if the attribute that is being grouped (distinct is just like grouping except you don't record any statistics per group) is part of the grouping. For example if you want to find the number of distinct values of SALESPERSON and you are grouping over the organisation hierarchy then you can probably just use SUM. This is already build into Mondrian.

As for how that plays out in certain databases which use CUBE, ROLLUP and GROUPING SET (I think that's the Oracle syntax), I have to admit that I'm ignorant. I know it exists but that's about it. I come from SAS where we use PROC SUMMARY for that sort of thing and it doesn't handle distinct counts.

As for transaction contexts you have a problem. Most of the time your datastore is not changing and you perform read-only operations. You can under these circumstances not re-use the same transaction context. In fact Mondrian at present doesn't use the same transaction for different GROUP BYs on the fact table in order to service the one MDX query. Transaction contexts only become important when you want a consistent view of the data and the underlying data store is dynamically changing.

The problem is that even though we could in most cases make do with a non-transactional data store, all RDBMSs are transactional systems. They all use the CLI mapping of transaction to thread. That means that you use just one thread to get a connection, issue a query, then another etc then close the connection. If you want to issue parallel SQL queries on the same DB you need multiple threads openning different Connections that almost certainly don't share a genuine transaction context. In Java EE (and J2EE before it) there is still no standard way to launch a separate thread. You only service requests in the thread provided by the container. There are non-official 'standards' to do this such as IBM/BEA's CommonJ and JBoss's Asynchronous Beans. I've actually thought about this more in the context of partitionning along members than aggregations. Look at this thread to see how I have been thinking about this:
http://www.theserverside.com/news/thread.tss?thread_id=43916#225895

Good weekend, Michael







___________________________________________________________________________
D