I am building monitoring support. I have added support for connections, statements, executions, and sql statements. For example, we know how many cell accesses an execution has made. A user of the monitoring API can subscribe to events (indicating that something of interest has happened, say a statement has ended) and can poll for lists of objects (connections, statements etc.).

I'd have not yet modeled segment cache activity. It is tricky because But I will need to clarify the lifecycle model for a segment. Here's what I have so far. A segment starts as "private" and empty. Later it is populated. After the statement has finished, it moves to "local". Later (or immediately?) it is published to infinispan. Then the segment may be said to exist both in infinispan and locally. So I would say that it is "resident".

I'll posit that a segment is always in precisely one of these states:
"pending" (i.e. private to a statement and not loaded)
"private" (i.e. private to a statement and data set is loaded)
local (i.e. in the jvm but not yet published to infinispan)
resident (i.e. in the jvm and also in infinispan)
paged (i.e. in infinispan but not in the jvm)

For each of these states we would have a total: thus pendingSegmentCount, pendingSegmentByteCount, pendingSegmentCellCount, privateSgementCount, etc.

Do you agree with this breakdown? Do you agree with these names?

Another design problem. How do we represent the action of the cache control API? Simplest may be to say it deletes a segment and creates another (with some of the cells knocked out). There would be a "segment delete" event and a "segment create" event.

Lastly, another thing I noticed during my investigations. It was strange that mondrian doesn't manage segments individually, rather it manages aggregations (groups of segments with the same dimensionality). For example, there is a collection "RolapStar.pendingAggregations" that "holds all pending aggregations of this star that are waiting to be pushed into the global cache". Aggregations are of course a useful indexing strategy, but I think mondrian should be operating on individual segments. The monitoring events would definitely make more sense this way; but we should also eliminate, or at least slim down and minimize use of, the Aggregation class.

Mondrian mailing list
Mondrian (AT) pentaho (DOT) org