PDA

View Full Version : RE: [Mondrian]



Pappyn Bart
01-23-2007, 07:30 AM
I was also thinking of moving to ThreadLocal, I will take over your modifications as soon as they
are checked in - thanks.

I did develop some new features in the mean while and added the possibility to attach a data source
listener to a star.

Because there are other multi-user related problems with loading of aggregates, I designed a new principle.

Now agg cache is checked before a query is run. If any aggregation has changed due to changes in the database,
a new - thread local - aggregation is made. The thread will fill this aggregation. Any other threads that where using
global (non thread local) cache, will not be interfered (what was the case in the past). After the thread has finished,
it will try to move the local cache to the global cache in case no other threads are using the aggregation.

A time of creation is maintained, so only the latest changes are put into global cache.

If the thread has cache turned off, then the principle that is currently in place, stays the same, thread local cache
is flushed after the query has finished.

Later on, if transactions are put in to place (I will not do this in the upcoming patch), I think multi-user access
and data integrity will be much better.

The only problem - up to now - is the hierarchy cache, since it does not follow the execution of a query, but
I am thinking of applying the same principle of that of the agg cache.

Bart

________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of Julian Hyde
Sent: dinsdag 23 januari 2007 11:57
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Re:VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I think the problem is with how mondrian evaluates members using multiple passes. When the measures are coming from a virtual cube, of course there are multiple real cubes, and each of those has a cell reader. But the code in RolapResult assumes there is only one cell reader.

Mondrian should check the cell readers for all applicable cubes, and only emit a result when all cell readers have been populated.

I haven't implemented the fix yet, but this cause seems very plausible to me.

I'm not exactly sure why this problem surfaced after Bart's change - maybe thread-local caches increased the chances of one cache being populated and another not - or why it appears on SMP machines.

By the way, in an effort to get this working, I removed Bart's RolapStarAggregationKey (a compound key of BitKey and thread id) and moved to a two-tier hashing scheme. The first tier is a ThreadLocal of maps, and the second tier is a map. Threads which want access to the global map just skip the first tier. Given the difficulties obtaining a unique id for a thread, using a ThreadLocal seemed cleaner. So, even though this didn't fix the bug, I'm going to check in.

Julian


________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of michael bienstein
Sent: Monday, January 22, 2007 12:06 PM
To: Mondrian developer mailing list
Subject: Re : [Mondrian] Re: VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I've seen issues with server mode JIT before related to memory boundaries and multiple threads. But that's mutiple threads and it was in JDK 1.4 (the memory model changed in 1.5 I think). The issue is that the instructions in the Java code can be run out of order to the way you've coded them. E.g. a=1; b=2; a=b; can be run just a=2; b=2; because that's what it is equivalent to. The only way to force it to do what you really expected is to synchronize your accesses because that prevents the instruction re-ordering across the memory boundary. This was an issue in Apache Struts at one point because they used a custom Map implementation called "FastHashMap" which gets filled with values and then flipped to be in immutable mode. The problem was that the get() method tested if it was flipped already without synchronizing which looked safe because the flip flag was set only after the insertion code. But the JIT reversed the order and the flip was done before the last insertions leading to certain problems on high-end servers.

All that's a moot point if we can't see how multiple threads are being used.

Michael


----- Message d'origine ----
De : John V. Sichi <jsichi (AT) gmail (DOT) com>

Pappyn Bart
01-23-2007, 08:51 AM
Julian,

I think I understand something wrong.

I thought RolapResult executes in several passes :

A) Execute stripe and record missing aggregations
B) Stop if no requests otherwise load missing aggregations

I thought B) would wait until each aggregations is loaded that is currently in batch?

I thought that B) would stop if A) was satisfied ?

I thought A) could only be satisfied if get() of fastbatchingcellreader would actually
return a result and not push another request in the batch list ?

fastbatchingcellreader calls aggrmanager to load aggregations for each batch request,
it does not matter if there are multiple stars involved, since it will ask the corresponding
star for an aggregation object and that calls the load() method of that aggregation.

Can you point me to the piece of code that would make RolapResult stop faster that I should ?

Would be helpful to me, in order to understand this piece of code better.

Thanks,
Bart


________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of Julian Hyde
Sent: dinsdag 23 januari 2007 11:57
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Re:VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I think the problem is with how mondrian evaluates members using multiple passes. When the measures are coming from a virtual cube, of course there are multiple real cubes, and each of those has a cell reader. But the code in RolapResult assumes there is only one cell reader.

Mondrian should check the cell readers for all applicable cubes, and only emit a result when all cell readers have been populated.

I haven't implemented the fix yet, but this cause seems very plausible to me.

I'm not exactly sure why this problem surfaced after Bart's change - maybe thread-local caches increased the chances of one cache being populated and another not - or why it appears on SMP machines.

By the way, in an effort to get this working, I removed Bart's RolapStarAggregationKey (a compound key of BitKey and thread id) and moved to a two-tier hashing scheme. The first tier is a ThreadLocal of maps, and the second tier is a map. Threads which want access to the global map just skip the first tier. Given the difficulties obtaining a unique id for a thread, using a ThreadLocal seemed cleaner. So, even though this didn't fix the bug, I'm going to check in.

Julian


________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of michael bienstein
Sent: Monday, January 22, 2007 12:06 PM
To: Mondrian developer mailing list
Subject: Re : [Mondrian] Re: VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I've seen issues with server mode JIT before related to memory boundaries and multiple threads. But that's mutiple threads and it was in JDK 1.4 (the memory model changed in 1.5 I think). The issue is that the instructions in the Java code can be run out of order to the way you've coded them. E.g. a=1; b=2; a=b; can be run just a=2; b=2; because that's what it is equivalent to. The only way to force it to do what you really expected is to synchronize your accesses because that prevents the instruction re-ordering across the memory boundary. This was an issue in Apache Struts at one point because they used a custom Map implementation called "FastHashMap" which gets filled with values and then flipped to be in immutable mode. The problem was that the get() method tested if it was flipped already without synchronizing which looked safe because the flip flag was set only after the insertion code. But the JIT reversed the order and the flip was done before the last insertions leading to certain problems on high-end servers.

All that's a moot point if we can't see how multiple threads are being used.

Michael


----- Message d'origine ----
De : John V. Sichi <jsichi (AT) gmail (DOT) com>

Julian Hyde
01-24-2007, 05:20 PM
M ichael Bienstein wrote:

Just two thoughts on this:
1) Currently I think that a HashMap is used for the global cache. HashMap
is not safely synchronized. There is a synchronize block that is too large
probably - the whole aggregations data.

I still think that the solution outlined in my previous email will work. I
want to try that first. I just need time to try it. Which means I have to
stop reading/writing long emails and cleaning up other people's mess. :)



2) Two-tier using threadlocal sounds good. Can we do this idea:


Yes, but it doesn't help solve the immediate problem. The immediate problem
is that one of the tests in the regression suite has been broken for almost
a month, and I have promised a release of mondrian in January. So, we need
to stabililze mondrian.

Your idea will help solve the problem of mondrian working on a dynamic
database, but right now mondrian doesn't even work on a static database. Put
those ideas into an enhancement request and we can consider them after the
release.



interface QueryContext {
Connection getConnection(DataSource ds);
some sort of common filter for aggregation cache and hierarchy caches
void dispose();
}

class QueryContextImpl {
private HashMap<DataSource, Connection> openConnections = new
HashMap<DataSource,Connection>();
public Connection getConnection(DataSource ds) {
Connection c = openConnections.get(ds);
if (c != null) {
return c;
}
try {
c = ds.getConnection();
openConnections.put(ds, c);
return c;
} catch (SQLException ex) {
throw new MondrianExceptionOrSomething(ex);
}
}
//TODO some filtering to the global aggregation and hierarchy caches
public void dispose() {
for (Connection c : openConnections.valueSet()) {
try {
c.close();
} catch (SQLException ex) {
log it ...
}
}
}
}

RolapResult.java:
{
private static ThreadLocal<QueryContext> qContext = new
ThreadLocal<QueryContext>();
public static QueryContext getQueryContext() {
return qContext.get();
}
public RolapResult(...) {
...
if (!execute) {
return;
}
//Going to execute
QueryContext qc = createQueryContext();
qContext.set(qc);
try {
//Do execute stuff here
} finally {
qContext.clear();
qc.dispose();
}
...
}
//Use a property to override the class used? That way we can configure
each Connection specifically?
public QueryContext createQueryContext() {
return new QueryContextImpl();
}
}
//All places in the code base that use DataSource to obtain a Connection in
the context of a query should use:
Connection c = RolapResult.getQueryContext().getConnection(ds);

That way we only open one connection per query and we use the database's
transaction system.

I found this hard to do because of the RolapConnection/RolapCube
constructors calling each other somehow (can't remember the details).

Michael
----- Message d'origine ----
De : Julian Hyde <julianhyde (AT) speakeasy (DOT) net>

Pappyn Bart
01-26-2007, 03:53 AM
Julian,

Change 8582 introduces ThreadLocal, I do not use thread-id any more, so I guess your changes are redundant.
However, the change does more than the previous one.

Like already told in the mailing list last week, I was busy with also implementing the data change listener plugin to support flushing
of aggregate cache. That part is finished (however, I noticed I need to put a few things public to let my plugin access to necessary
data, something I will likely check in today) en is in change 8582.

RolapStarAggregationKey is not obsolete any more, since in my latest implementation, I must besides the BitKey also remember
the timeStamp the aggregation was registered the first time in the star.

This is because I have implemented the following :

* When a query starts, changes are checked (using the plugin). When changes are detected, a thread local aggregation is made,
the thread will fill this one up. It will not change the global cache any more, because an other thread my depend on that data.

* After the query finishes, it will first cleanup aggregates belonging to a star that does not maintain any cache.

* Afterwards, it will try to push thread local cache to global cache in case the global aggregate is not used by any thread. If this
is not the cache, it will be moved to pending cache (that is not thread local).

* After each query, pending cache is checked again to see if it can be checked in global cache.

The RolapStarAggregationKey is needed, because the timeStamp is used to only check in the latest version of the cache into
global cache.

This is the first step into an attempt to make mondrian A) work better in multi-user environments B) live along a dynamic database.

The only thing that needs to be done is the introduction of transactions and better control of how jdbc connections are used.

The biggest problem up to now, is the fact that hierarchy cache is not in sync with a mdx query. So a mdx query might depend on
different data. In my project, I don't see any errors, this is of course largely due to the fact how the database is structured and filled.

I have ran junit tests at least 10 times on different machines. I also ran in JIT 'server mode' on a SMP machine, because the behavior is different.
All works up to now.

Kind regards,
Bart


________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of Julian Hyde
Sent: donderdag 25 januari 2007 22:00
To: mondrian (AT) pentaho (DOT) org
Subject: RE: [Mondrian] Re:VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


Bart,

Your change 8582 clashes with the stuff I referred to in the last paragraph of this message. The only reason I didn't check it in was because it wasn't sufficient -- on its own -- to fix the test failure, and now I either have to merge or discard my changes.

My change was to obsolete RolapStarAggregationKey and revert to BitKey to identify aggregations. Any aggregations which belong to a particular thread are in a collection specific to that thread. I am convinced that using thread-id is unsound - in particular, things will tend to stay in the cache after their thread has died, a problem which ThreadLocal neatly avoids.

I'd like either you to make that change -- or stop making changes in that area and let me the go-ahead to make that change. Which do you prefer?

Julian


________________________________

From: Julian Hyde [mailto:julianhyde (AT) speakeasy (DOT) net]
Sent: Tuesday, January 23, 2007 2:57 AM
To: 'Mondrian developer mailing list'
Subject: RE: [Mondrian] Re: VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I think the problem is with how mondrian evaluates members using multiple passes. When the measures are coming from a virtual cube, of course there are multiple real cubes, and each of those has a cell reader. But the code in RolapResult assumes there is only one cell reader.

Mondrian should check the cell readers for all applicable cubes, and only emit a result when all cell readers have been populated.

I haven't implemented the fix yet, but this cause seems very plausible to me.

I'm not exactly sure why this problem surfaced after Bart's change - maybe thread-local caches increased the chances of one cache being populated and another not - or why it appears on SMP machines.

By the way, in an effort to get this working, I removed Bart's RolapStarAggregationKey (a compound key of BitKey and thread id) and moved to a two-tier hashing scheme. The first tier is a ThreadLocal of maps, and the second tier is a map. Threads which want access to the global map just skip the first tier. Given the difficulties obtaining a unique id for a thread, using a ThreadLocal seemed cleaner. So, even though this didn't fix the bug, I'm going to check in.

Julian


________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org] On Behalf Of michael bienstein
Sent: Monday, January 22, 2007 12:06 PM
To: Mondrian developer mailing list
Subject: Re : [Mondrian] Re: VirtualCubeTest.testCalculatedMemberAcrossCubesfailing on SMP


I've seen issues with server mode JIT before related to memory boundaries and multiple threads. But that's mutiple threads and it was in JDK 1.4 (the memory model changed in 1.5 I think). The issue is that the instructions in the Java code can be run out of order to the way you've coded them. E.g. a=1; b=2; a=b; can be run just a=2; b=2; because that's what it is equivalent to. The only way to force it to do what you really expected is to synchronize your accesses because that prevents the instruction re-ordering across the memory boundary. This was an issue in Apache Struts at one point because they used a custom Map implementation called "FastHashMap" which gets filled with values and then flipped to be in immutable mode. The problem was that the get() method tested if it was flipped already without synchronizing which looked safe because the flip flag was set only after the insertion code. But the JIT reversed the order and the flip was done before the last insertions leading to certain problems on high-end servers.

All that's a moot point if we can't see how multiple threads are being used.

Michael


----- Message d'origine ----
De : John V. Sichi <jsichi (AT) gmail (DOT) com>