PDA

View Full Version : [Mondrian] RE: List of Members memory usage



Julian Hyde
01-09-2007, 06:51 PM
xx


-----Original Message-----

From: Richard Emberson [ <mailto:remberson (AT) edgedynamics (DOT) com>
mailto:remberson (AT) edgedynamics (DOT) com]
Sent: Tuesday, January 09, 2007 6:19 AM
To: Julian Hyde
Subject: Re: List of Members memory usage

Concerning the new Position interface:

interface Position extends List<Member> {
}

we also agreed that it would be an unmodifiable list and that most of its
methods would throw an UnsupportedOperationException. Specifically, the only
methods supported would be:

boolean isEmpty();
int size();
Iterator<Member> iterator();
Member get(int index);

and possibly

ListIterator<Member> listIterator();
ListIterator<Member> listIterator(int index);

where the iterators would be readonly.

Agreed.

Concerning ITERABLE:
1) ITERABLEs are readonly,
2) it should be noted that when a 'sort' is required, then the collection of
Members must be materialized because a sort requires the ability to modify
the order of the Members in the collection.

[With xslt's one can stream the transformation of a document unless one does
a 'sort', in which case, the whole document has to be read into memory at
the same time. There is a similar issue when using LISTs or ITERABLEs when a
sort is encountered.]

Again, agreed. Note that unlike Position, Iterable is for internal use
within mondrian, in implementing MDX expressions. It's OK if its use is
somewhat circumscribed.

Sort is the classic example of a blocking operation. It breaks the iterator
model, but we hope that in practice, there are ways around it, such as
reading from an underlying source which is already sorted.

One important usage of sort is the Rank function. We know that sorting the
sets underlying a rank operator is expensive, so we introduce a 'cached
expression' which ensures that the list is only sorted once. If I recall,
the list stored in the cache is specially indexed so we can compute rank
more efficiently.

3) Iterable is a Java5 interface:

public interface Iterable<T> {
Iterator<T> iterator();
}

which means that the consumer can not tell how many members it has without
iterating through them.

Yes, it was introduced in 1.5. But retroweaver has its own implementation of
Iterable for 1.4.

It's true that you don't know the number of elements an iterator is going to
return. That is a disadvantage for the consumer (because it has to operate
with less information) but an advantage to the producer (because it has less
information to deliver), so things tend to even out.

We can discuss adding 'added value' versions of iterable if there is
information which is cheap for producers to compute and extremely useful to
consumers. E.g.

interface IterableWithCount<T> extends Iterable<T> {
int size();
}

If the consumer absolutely has to know the number of elements in the set
(e.g. the Tail function) then it should ask the producer for a list.

Sometimes it is sufficient for the consumer to have an approximate count. If
so ,we can introduce cost metrics into the expression model.

Julian


_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian

Richard Emberson
01-10-2007, 12:20 PM
I just happened upon the following article:
http://today.java.net/pub/a/today/2006/11/07/nuances-of-java-5-for-each-loop.html
- in particular 'Item 11'.
Does anyone know if the Sun Java5 runtime compiler implements
such an optimization?
If one is creating a list that accesses another list, then
one ought to pay attention to whether the wrapped list
implements java.util.RandomAccess. If it does, then the
wrapping list ought to implement java.util.RandomAccess.
In particular, the new RolapAxis implementations all
assume that the underlying list (Member or Member[])
implement java.util.RandomAccess. Currently, I believe
this is the case since ArrayList is being used, but
the underlying list's type might change in the future.

If one had an Iterable that was under the covers wrapping
a ResultSet, then one would want to make sure that the
compiler did not assume that one had a RandomAccess list.

Richard


Julian Hyde wrote:
> xx
>
>
> -----Original Message-----
>
> From: Richard Emberson [_mailto:remberson (AT) edgedynamics (DOT) com_]
> Sent: Tuesday, January 09, 2007 6:19 AM
> To: Julian Hyde
> Subject: Re: List of Members memory usage
>
> /Concerning the new Position interface:/
>
> /interface Position extends List<Member> {
> }/
>
> /we also agreed that it would be an unmodifiable list and that most
> of its methods would throw an UnsupportedOperationException.
> Specifically, the only methods supported would be:/
>
> /boolean isEmpty();
> int size();
> Iterator<Member> iterator();
> Member get(int index);/
>
> /and possibly/
>
> /ListIterator<Member> listIterator();
> ListIterator<Member> listIterator(int index);/
>
> /where the iterators would be readonly./
>
> Agreed.
>
> /Concerning ITERABLE:
> 1) ITERABLEs are readonly,
> 2) it should be noted that when a 'sort' is required, then the
> collection of Members must be materialized because a sort requires
> the ability to modify the order of the Members in the collection./
>
> /[With xslt's one can stream the transformation of a document unless
> one does a 'sort', in which case, the whole document has to be read
> into memory at the same time. There is a similar issue when using
> LISTs or ITERABLEs when a sort is encountered.]/
>
> Again, agreed. Note that unlike Position, Iterable is for internal use
> within mondrian, in implementing MDX expressions. It's OK if its use is
> somewhat circumscribed.
>
> Sort is the classic example of a blocking operation. It breaks the
> iterator model, but we hope that in practice, there are ways around it,
> such as reading from an underlying source which is already sorted.
>
> One important usage of sort is the Rank function. We know that sorting
> the sets underlying a rank operator is expensive, so we introduce a
> 'cached expression' which ensures that the list is only sorted once. If
> I recall, the list stored in the cache is specially indexed so we can
> compute rank more efficiently.
>
> /3) Iterable is a Java5 interface:/
>
> /public interface Iterable<T> {
> //Iterator<T> iterator();
> //}/
>
> /which means that the consumer can not tell how many members it has
> without iterating through them./
>
> Yes, it was introduced in 1.5. But retroweaver has its own
> implementation of Iterable for 1.4.
>
> It's true that you don't know the number of elements an iterator is
> going to return. That is a disadvantage for the consumer (because it has
> to operate with less information) but an advantage to the producer
> (because it has less information to deliver), so things tend to even out.
>
> We can discuss adding 'added value' versions of iterable if there is
> information which is cheap for producers to compute and extremely useful
> to consumers. E.g.
>
> interface IterableWithCount<T> extends Iterable<T> {
> int size();
> }
>
> If the consumer absolutely has to know the number of elements in the set
> (e.g. the Tail function) then it should ask the producer for a list.
>
> Sometimes it is sufficient for the consumer to have an approximate
> count. If so ,we can introduce cost metrics into the expression model.
>
> Julian
>


--
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all
copies of the original message.
_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian