PDA

View Full Version : [Mondrian] Re: Optimizing Sync Blocks in Aggregation class



Thiyagu Palanisamy
05-31-2007, 09:00 AM
We have done some more improvements to our Approach 2 : Move the lock from
Aggregation to Segment wherever it is possible (CR 9366)

New Changes:
1. Synchronized block removed from AggregationManager.getCellFromCache()
The block was covering call to Aggregation.getCellValue() and
synchronization is handled with in Aggregation.getCellValue() method
itself.
2. Synchronized block removed from Aggregation.getCellValue()
This method was earlier synchronized to ensure that access to segmentRefs
is thread safe. Now we have made segmentRefs as CopyOnWriteArrayList
instance instead of ArrayList. This frees us up from handling the
Synchronized access to this list. We have selected CopyOnWriteArrayList
because <quote java doc> it is more efficient than alternatives when
traversal operations vastly outnumber mutations</quote java doc>, which is
the case in our scenario, there will be lot of get cellvalue calls but
only few times we will be modifying segments in the list.
3. Made Segment.getCellValue() Synchronized
As we have removed synchronization from Aggregation we are
synchronizing at Segment level.

We have uploaded the latest change to
http://forums.pentaho.org/showthread.php?t=54274
We also ran Jprofiler against all the 3 versions(Current implementation,
Approach1, Approach2) and the reports are attached to the same post. We
could see that lock contention is drastically reduced for data load path
(Aggregation object).

Thanks,
Thiyagu & Tushar




Thiyagu Palanisamy/India/ThoughtWorks
30/05/2007 20:29

To
Mondrian
cc

Subject
Optimizing Sync Blocks in Aggregation class





Hi,

We are proposing optimization of synchronization blocks involved in
loading of data from DB.

Current system is synchronized on Aggregation object while their segments
are loaded from DB. We did analysis using JProfiler to identify the
maximum lock contentions.

Issues with the current implementation:
1. Because of the lock at Aggregation, requests are forced to wait even
though they are not needed. Eg., If any previous request is loading a new
segment to Aggregation all the requests that need the same aggregation
will be blocked regardless of segment data availability.
2. Aggregation is locked for entire period of creating segment, firing
sql, loading result set back on to segments. So aggregation is locked when
DB is processing data.

We got 2 different working solutions to address this issue

Approach 1: Keep locking on Aggregation itself, but optimization the
duration lock (CR 9325)
Changes made for this solution:
1. Syncronized block removed from AggregationManager.load()
Because this blocks covers call to Aggregation.optimizePredicates(),
Aggregation.load() but both these methods are already Synchronized.
2. Made Aggregation.optimizePredicates() non synchronized
Uses maxConstraints,star fields of aggregation instance but they are
read only and set only in constructor.
Parameters to this method
columns : created at CellRequest, it is read only access in the
method. This is related Aggregation object but columns list will remain
same for the life time of the Aggregation class.
predicates : created at FBCR which is unique to a single request,
it is also read only and has no association to Aggregation object.
3. Made Aggregation.load() non synchronized
Uses constrainedColumnsBitKey,segmentRefs fields of aggregation
instance.
constrainedColumnsBitKey : This is read only and set in
constructor
segmentRefs : We are still synchronizing on aggregation when
accessing this
Parameters to this method
columns : created at CellRequest, this is set to columns
instance variable. columns list will remain same for the life time of the
Aggregation class so we have made change to set only at the first time
load is called on aggregation. This will happen only during creation of
Aggregation instance and it will be threadlocal at that point.
predicates : has no association to Aggregation object
pinnedSegments : has no association to Aggregation object
Measures: has no association to Aggregation object
4. Changes in Aggregation.load()
New Segments are added to segmentRefs only after they are loaded with
the data. This will ensure that none of the Segments are exposed to other
threads when they are loading.
Advantages with this approach
1. Aggregation object is locked only during the time of adding loaded
Segments to segmentRefs.
2. This approach requires only minor modifications to the existing
locking strategy
Disadvantage with this approach
If more than one request comes at the same time for a same new segment
of an existing aggregation, both the requests will fire query and load the
segment data. This issue is already exists in the current version too, but
window in which this can happen is small, this will happen only until one
of the request reaches the AggregationManager.load(). With the new change
this will happen until segment is set to AggregationManager.segmentRefs.
Approach 2: Move the lock from Aggregation to Segment wherever it is
possible (CR 9366)
1. Synchronized block removed from AggregationManager.load() : reason
same as approach 1
2. Made Aggregation.optimizePredicates() non synchronized : reason
same as Approach 1
3. Made Aggregation.load() non synchronized : reason same as Approach
1
4. Changes in Aggregation.load()
Aggregation object is altered only when New Segments are added to
segmentRefs, so we have Sync block covering add to segmentRefs
5. Made Aggregation.getCellValue() non synchronized
Only access to segmentRefs needs to be Synchronized, introduced
sync block to cover access to segmentRefs and when segment is not ready
and segment would contain the requested value(which means another thread
is loading the segment) current request will be blocked until segment is
loaded. And return value once segment is ready. If segment fails to load
it will return null similar to when segment is not found.
Advantage with this approach
1.Aggregation object is locked only during the time of adding Segments
to segmentRefs.
2. Concurrent requests will be blocked only if both the requests are
for same segment which is loading. This is a significant improvement from
Approach 1, because this reduces number of blocking requests.
Disadvantage with this approach
1.Locking strategy has been changed to part on Aggregation and part on
Segment. This also means that we need more testing.
2.We are using segment state for synchronization, this piece of code
already exists but not used currently.
We have attached both our implementations to the forum post:
http://forums.pentaho.org/showthread.php?t=54274
ConcurrentMDXTest class is to detected lock issues, this test is skewed
towards simulating concurrent access of same aggregation, this is not
valid for performance comparison of different implementations.

Please give your inputs on which approach we should move forward and
issues or improvements we can make on to that approach. Also it would be
of great help if you can give us some more scenarios for concurrent
testing.
Thanks,
Thiyagu, Tushar

_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Julian Hyde
06-11-2007, 03:51 AM
Thiyagu,

Both approaches seem to be an improvement. Approach 2 seems to be
better, especially if the only mutable state in Aggregation is the list
of Segments.

Re. our previous discussions on JDK 1.4 vs. 1.5. I understand that
CopyOnWriteArrayList is only available under 1.5. Can we use a factory
method to control the creation of this list, so that it can be made
available in JDK 1.4 using retroweaver?

I don't really understand the profiler results. They seem to be improved
in general, but there is a large number of blocks for HashMap in
approach 1. What caused that?

What is the high-level locking strategy? In particular, if multiple
objects need to be locked (say Aggregation and Segment), is there a
well-defined locking order.

When you implement this, please document the locking strategy in the
code. Maybe describe which fields are mutable and when, which classes to
lock and in what order. Less is more - people are more likely to read,
obey and maintain documentation which is terse and to the point.

I want to rationalize thread-locals out of the design and into real
objects with a stated purpose and strategy. If you can start to do that
with the design, I would appreciate it.

Lastly, please rename ConcurrentMDXTest to ConcurrentMdxTest; our coding
standards call for acronyms to be InitCapped not CAPITALIZED.

Julian

PS Sorry I didn't reply earlier. This reply somehow ended up in my
Drafts folder and I forgot to send it.


_____

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
On Behalf Of Thiyagu Palanisamy
Sent: Thursday, May 31, 2007 5:42 AM
To: mondrian (AT) pentaho (DOT) org
Subject: [Mondrian] Re: Optimizing Sync Blocks in Aggregation class



We have done some more improvements to our Approach 2 : Move the lock
from Aggregation to Segment wherever it is possible (CR 9366)

New Changes:
1. Synchronized block removed from AggregationManager.getCellFromCache()

The block was covering call to Aggregation.getCellValue() and
synchronization is handled with in Aggregation.getCellValue() method
itself.
2. Synchronized block removed from Aggregation.getCellValue()
This method was earlier synchronized to ensure that access to
segmentRefsis thread safe. Now we have made segmentRefs as
CopyOnWriteArrayList instance instead of ArrayList. This frees us up
from handling the Synchronized access to this list. We have selected
CopyOnWriteArrayList because <quote java doc> it is more efficient than
alternatives when traversal operations vastly outnumber mutations</quote
java doc>, which is the case in our scenario, there will be lot of get
cellvalue calls but only few times we will be modifying segments in the
list.
3. Made Segment.getCellValue() Synchronized
As we have removed synchronization from Aggregation we are
synchronizing at Segment level.


We have uploaded the latest change to
<http://forums.pentaho.org/showthread.php?t=54274>
http://forums.pentaho.org/showthread.php?t=54274


We also ran Jprofiler against all the 3 versions(Current implementation,
Approach1, Approach2) and the reports are attached to the same post. We
could see that lock contention is drastically reduced for data load path
(Aggregation object).



Thanks,


Thiyagu & Tushar





Thiyagu Palanisamy/India/ThoughtWorks


30/05/2007 20:29


To
Mondrian

cc

Subject
Optimizing Sync Blocks in Aggregation class





Hi,

We are proposing optimization of synchronization blocks involved in
loading of data from DB.

Current system is synchronized on Aggregation object while their
segments are loaded from DB. We did analysis using JProfiler to identify
the maximum lock contentions.

Issues with the current implementation:
1. Because of the lock at Aggregation, requests are forced to wait even
though they are not needed. Eg., If any previous request is loading a
new segment to Aggregation all the requests that need the same
aggregation will be blocked regardless of segment data availability.
2. Aggregation is locked for entire period of creating segment, firing
sql, loading result set back on to segments. So aggregation is locked
when DB is processing data.

We got 2 different working solutions to address this issue

Approach 1: Keep locking on Aggregation itself, but optimization the
duration lock (CR 9325)
Changes made for this solution:
1. Syncronized block removed from AggregationManager.load()
Because this blocks covers call to Aggregation.optimizePredicates(),
Aggregation.load() but both these methods are already Synchronized.
2. Made Aggregation.optimizePredicates() non synchronized
Uses maxConstraints,star fields of aggregation instance but they are
read only and set only in constructor.
Parameters to this method
columns : created at CellRequest, it is read only access in the
method. This is related Aggregation object but columns list will remain
same for the life time of the Aggregation class.
predicates : created at FBCR which is unique to a single
request, it is also read only and has no association to Aggregation
object.
3. Made Aggregation.load() non synchronized
Uses constrainedColumnsBitKey,segmentRefs fields of aggregation
instance.
constrainedColumnsBitKey : This is read only and set in
constructor
segmentRefs : We are still synchronizing on aggregation when
accessing this
Parameters to this method
columns : created at CellRequest, this is set to columns
instance variable. columns list will remain same for the life time of
the Aggregation class so we have made change to set only at the first
time load is called on aggregation. This will happen only during
creation of Aggregation instance and it will be threadlocal at that
point.
predicates : has no association to Aggregation object
pinnedSegments : has no association to Aggregation object
Measures: has no association to Aggregation object
4. Changes in Aggregation.load()
New Segments are added to segmentRefs only after they are loaded with
the data. This will ensure that none of the Segments are exposed to
other threads when they are loading.

Advantages with this approach
1. Aggregation object is locked only during the time of adding loaded
Segments to segmentRefs.
2. This approach requires only minor modifications to the existing
locking strategy


Disadvantage with this approach
If more than one request comes at the same time for a same new
segment of an existing aggregation, both the requests will fire query
and load the segment data. This issue is already exists in the current
version too, but window in which this can happen is small, this will
happen only until one of the request reaches the
AggregationManager.load(). With the new change this will happen until
segment is set to AggregationManager.segmentRefs.


Approach 2: Move the lock from Aggregation to Segment wherever it is
possible (CR 9366)
1. Synchronized block removed from AggregationManager.load() : reason
same as approach 1
2. Made Aggregation.optimizePredicates() non synchronized : reason
same as Approach 1
3. Made Aggregation.load() non synchronized : reason same as Approach
1
4. Changes in Aggregation.load()
Aggregation object is altered only when New Segments are added to
segmentRefs, so we have Sync block covering add to segmentRefs
5. Made Aggregation.getCellValue() non synchronized
Only access to segmentRefs needs to be Synchronized, introduced
sync block to cover access to segmentRefs and when segment is not ready
and segment would contain the requested value(which means another thread
is loading the segment) current request will be blocked until segment is
loaded. And return value once segment is ready. If segment fails to load
it will return null similar to when segment is not found.


Advantage with this approach
1.Aggregation object is locked only during the time of adding
Segments to segmentRefs.
2. Concurrent requests will be blocked only if both the requests are
for same segment which is loading. This is a significant improvement
from Approach 1, because this reduces number of blocking requests.


Disadvantage with this approach
1.Locking strategy has been changed to part on Aggregation and part
on Segment. This also means that we need more testing.
2.We are using segment state for synchronization, this piece of code
already exists but not used currently.


We have attached both our implementations to the forum post:
http://forums.pentaho.org/showthread.php?t=54274
ConcurrentMDXTest class is to detected lock issues, this test is skewed
towards simulating concurrent access of same aggregation, this is not
valid for performance comparison of different implementations.

Please give your inputs on which approach we should move forward and
issues or improvements we can make on to that approach. Also it would be
of great help if you can give us some more scenarios for concurrent
testing.


Thanks,
Thiyagu, Tushar





_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Thiyagu Palanisamy
06-11-2007, 11:40 AM
Julian,

Thanks for the review.

I have checked in the Approach 2.

>>Re. our previous discussions on JDK 1.4 vs. 1.5. I understand that
CopyOnWriteArrayList is only available under 1.5. Can we use a factory
method to control the creation of this list, so that it can be made
available in JDK 1.4 using retroweaver?

Retroweaver supports CopyOnWriteArrayList, we have test it. (Our previous
discussions was about ReentrantReadWriteLock)

>>I don't really understand the profiler results. They seem to be improved
in general, but there is a large number of blocks for HashMap in approach
1. What caused that?

I didn't spend time on optimizing/analysing the approach 1 because we were
inclined towards approach 2.

>>What is the high-level locking strategy? In particular, if multiple
objects need to be locked (say Aggregation and Segment), is there a
well-defined locking order.
We have removed all locking on Aggregation(Segment list is synchronized by
itself), we are locking only on Segment and all the locking is done by
method level synchronization, so we don't need to bother about explicit
locking before any call.

>> please document the locking strategy in the code
Updated details about fields and methods which are newly synchronized.

- Thiyagu




"Julian Hyde" <julianhyde (AT) speakeasy (DOT) net>
Sent by: mondrian-bounces (AT) pentaho (DOT) org
11/06/2007 13:15
Please respond to
Mondrian developer mailing list <mondrian (AT) pentaho (DOT) org>


To
"'Mondrian developer mailing list'" <mondrian (AT) pentaho (DOT) org>
cc

Subject
RE: [Mondrian] Re: Optimizing Sync Blocks in Aggregation class






Thiyagu,

Both approaches seem to be an improvement. Approach 2 seems to be better,
especially if the only mutable state in Aggregation is the list of
Segments.

Re. our previous discussions on JDK 1.4 vs. 1.5. I understand that
CopyOnWriteArrayList is only available under 1.5. Can we use a factory
method to control the creation of this list, so that it can be made
available in JDK 1.4 using retroweaver?

I don't really understand the profiler results. They seem to be improved
in general, but there is a large number of blocks for HashMap in approach
1. What caused that?

What is the high-level locking strategy? In particular, if multiple
objects need to be locked (say Aggregation and Segment), is there a
well-defined locking order.

When you implement this, please document the locking strategy in the code.
Maybe describe which fields are mutable and when, which classes to lock
and in what order. Less is more - people are more likely to read, obey and
maintain documentation which is terse and to the point.

I want to rationalize thread-locals out of the design and into real
objects with a stated purpose and strategy. If you can start to do that
with the design, I would appreciate it.

Lastly, please rename ConcurrentMDXTest to ConcurrentMdxTest; our coding
standards call for acronyms to be InitCapped not CAPITALIZED.

Julian

PS Sorry I didn't reply earlier. This reply somehow ended up in my Drafts
folder and I forgot to send it.

From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
On Behalf Of Thiyagu Palanisamy
Sent: Thursday, May 31, 2007 5:42 AM
To: mondrian (AT) pentaho (DOT) org
Subject: [Mondrian] Re: Optimizing Sync Blocks in Aggregation class


We have done some more improvements to our Approach 2 : Move the lock from
Aggregation to Segment wherever it is possible (CR 9366)

New Changes:
1. Synchronized block removed from AggregationManager.getCellFromCache()
The block was covering call to Aggregation.getCellValue() and
synchronization is handled with in Aggregation.getCellValue() method
itself.
2. Synchronized block removed from Aggregation.getCellValue()
This method was earlier synchronized to ensure that access to segmentRefs
is thread safe. Now we have made segmentRefs as CopyOnWriteArrayList
instance instead of ArrayList. This frees us up from handling the
Synchronized access to this list. We have selected CopyOnWriteArrayList
because <quote java doc> it is more efficient than alternatives when
traversal operations vastly outnumber mutations</quote java doc>, which is
the case in our scenario, there will be lot of get cellvalue calls but
only few times we will be modifying segments in the list.
3. Made Segment.getCellValue() Synchronized
As we have removed synchronization from Aggregation we are
synchronizing at Segment level.
We have uploaded the latest change to
http://forums.pentaho.org/showthread.php?t=54274
We also ran Jprofiler against all the 3 versions(Current implementation,
Approach1, Approach2) and the reports are attached to the same post. We
could see that lock contention is drastically reduced for data load path
(Aggregation object).
Thanks,
Thiyagu & Tushar



Thiyagu Palanisamy/India/ThoughtWorks
30/05/2007 20:29


To
Mondrian
cc

Subject
Optimizing Sync Blocks in Aggregation class







Hi,

We are proposing optimization of synchronization blocks involved in
loading of data from DB.

Current system is synchronized on Aggregation object while their segments
are loaded from DB. We did analysis using JProfiler to identify the
maximum lock contentions.

Issues with the current implementation:
1. Because of the lock at Aggregation, requests are forced to wait even
though they are not needed. Eg., If any previous request is loading a new
segment to Aggregation all the requests that need the same aggregation
will be blocked regardless of segment data availability.
2. Aggregation is locked for entire period of creating segment, firing
sql, loading result set back on to segments. So aggregation is locked when
DB is processing data.

We got 2 different working solutions to address this issue

Approach 1: Keep locking on Aggregation itself, but optimization the
duration lock (CR 9325)
Changes made for this solution:
1. Syncronized block removed from AggregationManager.load()
Because this blocks covers call to Aggregation.optimizePredicates(),
Aggregation.load() but both these methods are already Synchronized.
2. Made Aggregation.optimizePredicates() non synchronized
Uses maxConstraints,star fields of aggregation instance but they are
read only and set only in constructor.
Parameters to this method
columns : created at CellRequest, it is read only access in the
method. This is related Aggregation object but columns list will remain
same for the life time of the Aggregation class.
predicates : created at FBCR which is unique to a single request,
it is also read only and has no association to Aggregation object.
3. Made Aggregation.load() non synchronized
Uses constrainedColumnsBitKey,segmentRefs fields of aggregation
instance.
constrainedColumnsBitKey : This is read only and set in
constructor
segmentRefs : We are still synchronizing on aggregation when
accessing this
Parameters to this method
columns : created at CellRequest, this is set to columns
instance variable. columns list will remain same for the life time of the
Aggregation class so we have made change to set only at the first time
load is called on aggregation. This will happen only during creation of
Aggregation instance and it will be threadlocal at that point.
predicates : has no association to Aggregation object
pinnedSegments : has no association to Aggregation object
Measures: has no association to Aggregation object
4. Changes in Aggregation.load()
New Segments are added to segmentRefs only after they are loaded with
the data. This will ensure that none of the Segments are exposed to other
threads when they are loading.
Advantages with this approach
1. Aggregation object is locked only during the time of adding loaded
Segments to segmentRefs.
2. This approach requires only minor modifications to the existing
locking strategy
Disadvantage with this approach
If more than one request comes at the same time for a same new segment
of an existing aggregation, both the requests will fire query and load the
segment data. This issue is already exists in the current version too, but
window in which this can happen is small, this will happen only until one
of the request reaches the AggregationManager.load(). With the new change
this will happen until segment is set to AggregationManager.segmentRefs.
Approach 2: Move the lock from Aggregation to Segment wherever it is
possible (CR 9366)
1. Synchronized block removed from AggregationManager.load() : reason
same as approach 1
2. Made Aggregation.optimizePredicates() non synchronized : reason same
as Approach 1
3. Made Aggregation.load() non synchronized : reason same as Approach 1
4. Changes in Aggregation.load()
Aggregation object is altered only when New Segments are added to
segmentRefs, so we have Sync block covering add to segmentRefs
5. Made Aggregation.getCellValue() non synchronized
Only access to segmentRefs needs to be Synchronized, introduced
sync block to cover access to segmentRefs and when segment is not ready
and segment would contain the requested value(which means another thread
is loading the segment) current request will be blocked until segment is
loaded. And return value once segment is ready. If segment fails to load
it will return null similar to when segment is not found.
Advantage with this approach
1.Aggregation object is locked only during the time of adding Segments
to segmentRefs.
2. Concurrent requests will be blocked only if both the requests are
for same segment which is loading. This is a significant improvement from
Approach 1, because this reduces number of blocking requests.
Disadvantage with this approach
1.Locking strategy has been changed to part on Aggregation and part on
Segment. This also means that we need more testing.
2.We are using segment state for synchronization, this piece of code
already exists but not used currently.
We have attached both our implementations to the forum post:
http://forums.pentaho.org/showthread.php?t=54274
ConcurrentMDXTest class is to detected lock issues, this test is skewed
towards simulating concurrent access of same aggregation, this is not
valid for performance comparison of different implementations.

Please give your inputs on which approach we should move forward and
issues or improvements we can make on to that approach. Also it would be
of great help if you can give us some more scenarios for concurrent
testing.
Thanks,
Thiyagu, Tushar _______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian


_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian