Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: [Mondrian] RE: Spatial index

  1. #1
    Julian Hyde Guest

    Default [Mondrian] RE: Spatial index

    Luc, Will,

    I've checked in a version that captures the requirements as I see them. It
    includes a method to identify sets of segments (I've called them
    SpatialRegions) that can be rolled up.

    http://perforce.eigenbase.org:8080/@...src/main/mondr
    ian/util/SpatialValueTree2.java

    I have re-introduced the distinction between 'segment requests' (called
    SpatialRegionRequest; for each dimension, the set of values to retrieve) and
    'segments' (called SpatialRegion; the actual set of values).

    For performance, I have made all kinds of constraints, e.g.
    SpatialDimensions are created by the tree itself, have contiguous and unique
    ordinals; values implement Comparable; arrays of values are sorted and
    unique and the client will not modify them. We can add more constraints, if
    it improves performance.

    Conversely, if there are things in this API that mondrian does not need,
    let's remove them.

    There are other general design notes in the javadoc comments at the head of
    the class.

    Developers,

    I would like to have this discussion in a public forum, so we might as well
    start now. Briefly, this is what we are trying to achieve...

    The goal is to design a data structure that will allow us to manage
    thousands, possibly millions, of segments containing cell values. Currently
    mondrian stores segments in a linked list for each dimensionality (e.g. all
    segments for (year, month, nation) are in the same list). But that won't be
    tenable when we (a) have an external cache with many thousands of segments,
    (b) we want to answer a cell request by rolling up coarser granularity
    segment(s). Our design process is to (a) propose an API, (b) write unit
    tests for functionality, performance and thread-safety, (c) write several
    implementations of the API, and may the best one win.

    Julian


    _____

    From: Luc Boudreau [mailto:lboudreau (AT) pentaho (DOT) com]
    Sent: Monday, April 18, 2011 9:05 PM
    To: Will Gorman; Julian Hyde
    Subject: Spatial index



    As discussed, I've sketched a first draft at a proposed spatial index.



    http://perforce.eigenbase.org:8080/@...src/main/mondr
    ian/util/SpatialValueTree.java
    <http://perforce.eigenbase.org:8080/@...src/main/mondr
    ian/util/SpatialValueTree.java>



    If you could give it a quick read and tell me if there is anything I might
    have missed or overlooked, I could then start writing a bunch of unit tests.



    Thanks!



    Luc


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  2. #2
    Luc Boudreau Guest

    Default Re: [Mondrian] RE: Spatial index

    Julian,

    I like the approach you took. It does make more sense to encapsulate the
    region and region query specs.

    Concerning your Javadoc comment for review for get(SpatialRegionRequest
    regionRequest), I agree that we should remove it for now. It would be hard
    to implement and does not provide any functionality we need right now.

    As for the rollup(Map<SpatialDimension, Object> dimensions), I think there
    might be an issue with it. Right now, it would only allow us to lookup one
    cell at a time. The API we need must be able to lookup all the regions
    necessary to populate a region "superset". Maybe I'm not understanding the
    API right.

    Luc


    On Tue, Apr 19, 2011 at 4:39 PM, Julian Hyde <jhyde (AT) pentaho (DOT) com> wrote:

    > Luc, Will,
    >
    > I've checked in a version that captures the requirements as I see them. It
    > includes a method to identify sets of segments (I've called them
    > SpatialRegions) that can be rolled up.
    >
    >
    > http://perforce.eigenbase.org:8080/@...alueTree2.java
    >
    > I have re-introduced the distinction between 'segment requests' (called
    > SpatialRegionRequest; for each dimension, the set of values to retrieve) and
    > 'segments' (called SpatialRegion; the actual set of values).
    >
    > For performance, I have made all kinds of constraints, e.g.
    > SpatialDimensions are created by the tree itself, have contiguous and unique
    > ordinals; values implement Comparable; arrays of values are sorted and
    > unique and the client will not modify them. We can add more constraints, if
    > it improves performance.
    >
    > Conversely, if there are things in this API that mondrian does not need,
    > let's remove them.
    >
    > There are other general design notes in the javadoc comments at the head of
    > the class.
    >
    > Developers,
    >
    > I would like to have this discussion in a public forum, so we might as well
    > start now. Briefly, this is what we are trying to achieve...
    >
    > The goal is to design a data structure that will allow us to manage
    > thousands, possibly millions, of segments containing cell values. Currently
    > mondrian stores segments in a linked list for each dimensionality (e.g. all
    > segments for (year, month, nation) are in the same list). But that won't be
    > tenable when we (a) have an external cache with many thousands of segments,
    > (b) we want to answer a cell request by rolling up coarser granularity
    > segment(s). Our design process is to (a) propose an API, (b) write unit
    > tests for functionality, performance and thread-safety, (c) write several
    > implementations of the API, and may the best one win.
    >
    > Julian
    >
    > ------------------------------
    > *From:* Luc Boudreau [mailto:lboudreau (AT) pentaho (DOT) com]
    > *Sent:* Monday, April 18, 2011 9:05 PM
    > *To:* Will Gorman; Julian Hyde
    > *Subject:* Spatial index
    >
    > As discussed, I’ve sketched a first draft at a proposed spatial index.
    >
    >
    >
    >
    > http://perforce.eigenbase.org:8080/@rev1=head@//open/mondrian/src/main/mondrian/util/SpatialValueTree.java<http://perforce.eigenbase.org:8080/@rev1=head@/open/mondrian/src/main/mondrian/util/SpatialValueTree.java>
    >
    >
    >
    > If you could give it a quick read and tell me if there is anything I might
    > have missed or overlooked, I could then start writing a bunch of unit tests.
    >
    >
    >
    > Thanks!
    >
    >
    >
    > Luc
    >
    >
    > _______________________________________________
    > Mondrian mailing list
    > Mondrian (AT) pentaho (DOT) org
    > http://lists.pentaho.org/mailman/listinfo/mondrian
    >
    >


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  3. #3
    Julian Hyde Guest

    Default RE: [Mondrian] RE: Spatial index

    Luc wrote:

    As for the rollup(Map<SpatialDimension, Object> dimensions), I think there
    might be an issue with it. Right now, it would only allow us to lookup one
    cell at a time. The API we need must be able to lookup all the regions
    necessary to populate a region "superset". Maybe I'm not understanding the
    API right.

    The idea is that if you ask for one cell, it will try to create a segment,
    by rolling up, that contains that cell.

    What about the next cell that mondrian needs? There are no guarantees, but
    there is a good chance that the next cell would be in that segment too.
    Computing a segment via rollup is so cheap (in terms of CPU and memory) that
    we might as well just do it, then see if the segment is useful. It's simpler
    than looking at thousands of cell requests and trying to create the optimal
    set of segments for them.

    The return from rollup is a list of SpatialRegions to combine. Those could
    be input to some procedure that generates the rolled up segment. I guess
    you'd also have to tell the procedure the desired dimensionality of the
    rolled up segment. Then it would create the largest rolled up segment that
    it can.

    (Possibly that segment wouldn't be 'rectangular' (actually
    hyper-rectangular). In which case, should the roll up process 'cut off the
    corners' to produce a rectangle? Or should it produce multiple segments? I
    don't know the answer yet, but I think the indexing API I've proposed is
    giving us the right information, anyway.)

    Julian

    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.