PDA

View Full Version : [Mondrian] Multithreading, Parallel Batch / Query Execution



Ajit Vasudeo Joglekar
04-18-2007, 07:00 AM
Hello,

We are exploring the ways to add parallel query execution in Mondrian. One
simple prototype we implemented goes through the batches in
FastBatchingCellReader and runs every batch in a new worker thread. To
make the ThreadLocals available to the worker thread we changed
ThreadLocal in RolapStar and other places to InheritableThreadLocal.

The objectives are simple,
- In case an MDX results in more than one sql for fact data, these sqls
should run in parallel
- No support for explicit transactions / real time updates to the database
- Thread pooling, life cycle management and related things will follow
later

We used JProfiler to run QueryRunner to identify the biggest bottlenecks
indicated by thread monitor wait period. QueryRunner was run with 5
threads, 100 seconds and random queries set to false (5, 100, false).
Profiling report was generated on the snapshot immediately after
QueryRunner exited. It was run on a Dell Latitude D620 with Core 2 Duo (2
Cores), 2 GB RAM, WinXp SP2 32 bit, Sun jdk 1.5.0_07. Mondrian connects to
Mysql 5 running on the same box.

Monitor Usage Statistics
Session: QueryRunner
Statistics: Monitor Usage Statistics Grouped by Monitors
Sorted by: Block duration

Without parallel batch execution
Monitors Block count Block
duration
mondrian.rolap.RolapBaseCubeMeasure (id: 11) 3490 9700 ms
mondrian.rolap.RolapCube (id: 12) 2426 10
s
mondrian.rolap.RolapSchema$Pool (id: 7) 4 15 s
java.util.HashMap (id: 18) 11851 35
s
mondrian.rolap.SmartMemberReader (id: 25) 24 42
s
java.lang.Class (id: 10) 129733 139
s

With parallel batch execution
Monitors Block count Block
duration
mondrian.rolap.RolapCube (id: 11) 1846 10
s
mondrian.rolap.agg.Aggregation (id: 42) 1023 12
s
java.util.HashMap (id: 20) 4435 19
s
mondrian.rolap.RolapSchema$Pool (id: 7) 4 21 s
mondrian.rolap.SmartMemberReader (id: 23) 35 57
s
java.lang.Class (id: 5) 91145 106 s

We are anlyzing these options to reduce lock contentions and improve
performance
a) Replacing the synchronization with Read and Write RenentrantLocks with
reduced scope
b) Using ConcurrentHashMap
c) Replace syncronized lazy initialize with static initialize at load time

To verify functional correctness after multithreaded batch execution we
adapted QueryRunner to do result verification. We observed an issue of
result set member ordering in case we toggle the
MondrianProperties.instance().DisableCaching between multiple test runs in
the same process. Apart from that all the existing test run successfully

We tried out various optimizations with varying degree of success

This is a simplistic approach but seems to be working.

We will post more on this once we have more statistics to validate the
benifits of this approach.
Please let us know if there are any other approaches we should be
considering.

-Ajit
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Pappyn Bart
04-18-2007, 07:11 AM
Hi,

Please make it configurable, so that applications needing support for
dynamic databases will also work.

Thanks,
Bart

________________________________

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
On Behalf Of Ajit Vasudeo Joglekar
Sent: woensdag 18 april 2007 12:39
To: mondrian (AT) pentaho (DOT) org
Subject: [Mondrian] Multithreading, Parallel Batch / Query Execution



Hello,

We are exploring the ways to add parallel query execution in Mondrian.
One simple prototype we implemented goes through the batches in
FastBatchingCellReader and runs every batch in a new worker thread. To
make the ThreadLocals available to the worker thread we changed
ThreadLocal in RolapStar and other places to InheritableThreadLocal.

The objectives are simple,
- In case an MDX results in more than one sql for fact data, these sqls
should run in parallel
- No support for explicit transactions / real time updates to the
database
- Thread pooling, life cycle management and related things will follow
later

We used JProfiler to run QueryRunner to identify the biggest bottlenecks
indicated by thread monitor wait period. QueryRunner was run with 5
threads, 100 seconds and random queries set to false (5, 100, false).
Profiling report was generated on the snapshot immediately after
QueryRunner exited. It was run on a Dell Latitude D620 with Core 2 Duo
(2 Cores), 2 GB RAM, WinXp SP2 32 bit, Sun jdk 1.5.0_07. Mondrian
connects to Mysql 5 running on the same box.

Monitor Usage Statistics
Session: QueryRunner
Statistics: Monitor Usage Statistics Grouped by Monitors
Sorted by: Block duration

Without parallel batch execution
Monitors Block count
Block duration
mondrian.rolap.RolapBaseCubeMeasure (id: 11) 3490
9700 ms
mondrian.rolap.RolapCube (id: 12) 2426
10 s
mondrian.rolap.RolapSchema$Pool (id: 7) 4
15 s
java.util.HashMap (id: 18) 11851
35 s
mondrian.rolap.SmartMemberReader (id: 25) 24
42 s
java.lang.Class (id: 10) 129733
139 s

With parallel batch execution
Monitors Block count
Block duration
mondrian.rolap.RolapCube (id: 11) 1846
10 s
mondrian.rolap.agg.Aggregation (id: 42) 1023
12 s
java.util.HashMap (id: 20) 4435
19 s
mondrian.rolap.RolapSchema$Pool (id: 7) 4
21 s
mondrian.rolap.SmartMemberReader (id: 23) 35
57 s
java.lang.Class (id: 5) 91145
106 s

We are anlyzing these options to reduce lock contentions and improve
performance
a) Replacing the synchronization with Read and Write RenentrantLocks
with reduced scope
b) Using ConcurrentHashMap
c) Replace syncronized lazy initialize with static initialize at load
time

To verify functional correctness after multithreaded batch execution we
adapted QueryRunner to do result verification. We observed an issue of
result set member ordering in case we toggle the
MondrianProperties.instance().DisableCaching between multiple test runs
in the same process. Apart from that all the existing test run
successfully

We tried out various optimizations with varying degree of success

This is a simplistic approach but seems to be working.

We will post more on this once we have more statistics to validate the
benifits of this approach.
Please let us know if there are any other approaches we should be
considering.

-Ajit
______________________________________________________________________
This email has been scanned by the Email Security System.
______________________________________________________________________


_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian