Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: [Mondrian] Mondrian Performance Test Harness

  1. #1
    Guest

    Default [Mondrian] Mondrian Performance Test Harness

    I wanted to describe an idea for creating a Mondrian Performance Test
    Harness and see if there was any input from the community. This is more
    than "maybe I'll get around to this" - the company I work for has a
    relationship with the CS department of a nearby university, and I've
    arranged to use this as a senior design project for a student team this
    semester.



    Here's the concept... Because we use Mondrian in a performance sensitive
    application (is there some other kind?), I wanted to figure out a way to
    have better regression test coverage of performance and throughput. What
    I have in mind is:



    * A new test database besides FoodMart with a scalable data generator
    (able to generate different size databases).

    * A Mondrian schema for this database.

    * A set of MDX queries that exercise the engine.

    * A JMeter test script to send these queries as XMLA requests as a
    single or multi-threaded workload.



    I was thinking of using TPC-DS (http://www.tpc.org/tpcds/default.asp) as
    the database and data generator. I can't tell if this benchmark is still
    being worked on, but there's a download with a set of tools. The data
    model contains multiple fact tables, so it should be possible to
    exercise virtual cubes, which is important to me.



    Any questions or suggestions?



    --Jeff Wright


    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  2. #2
    Julian Hyde Guest

    Default RE: [Mondrian] Mondrian Performance Test Harness

    Jeff,

    Thanks for raising this. It's something we have needed for a long time.

    See my comments inline.


    Jeff wrote:


    I wanted to describe an idea for creating a Mondrian Performance Test
    Harness and see if there was any input from the community. This is more than
    "maybe I'll get around to this" - the company I work for has a relationship
    with the CS department of a nearby university, and I've arranged to use this
    as a senior design project for a student team this semester.



    Here's the concept... Because we use Mondrian in a performance sensitive
    application (is there some other kind?), I wanted to figure out a way to
    have better regression test coverage of performance and throughput. What I
    have in mind is:



    * A new test database besides FoodMart with a scalable data generator (able
    to generate different size databases).

    Agreed. A scalable database generator is essential. FoodMart is too small
    for serious performance testing, and no one wants to download a 10GB
    database. Ergo, we need a generator.

    * A Mondrian schema for this database.

    * A set of MDX queries that exercise the engine.

    * A JMeter test script to send these queries as XMLA requests as a single or
    multi-threaded workload.

    I was thinking of using TPC-DS (http://www.tpc.org/tpcds/default.asp) as the
    database and data generator. I can't tell if this benchmark is still being
    worked on, but there's a download with a set of tools. The data model
    contains multiple fact tables, so it should be possible to exercise virtual
    cubes, which is important to me.

    As far as I can tell TPC-DS is not being actively used. Others have
    discussed issue of which benchmark to use. The most popular seems to be the
    Star Schema benchmark. See e.g.
    <http://www.mysqlperformanceblog.com/...chmark-infobri
    ght-infinidb-and-luciddb/>
    http://www.mysqlperformanceblog.com/...hmark-infobrig
    ht-infinidb-and-luciddb/, which has a discussion of the merits of various
    benchmarks. I'm fairly sure that someone has created a mondrian schema for
    the Star Schema benchmark.



    I would like to include this performance suite in mondrian's distribution as
    an optional set of tests. That implies that you should provide a fairly easy
    way to load the data set onto any database and instructions for how to set
    up the test harness.



    The other thing that I would like is the ability to do performance
    regression tests. That is, run the same tests regularly on the source code,
    and detect when a developer makes a change that damages performance. Test
    running times contain random noise -- a test might run slower one day for a
    variety of reasons such as cosmic rays striking the hard disk drive -- and
    so the test infrastructure would need only report degradations when several
    runs of the test have been significantly slower. This would entail keeping a
    database of historic performance stats, and computing say the standard
    deviation of each number.



    A framework for performance regression testing -- say as an extension to
    junit -- could be a nice research/open source project for someone.



    Julian




    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  3. #3
    Guest

    Default RE: [Mondrian] Mondrian Performance Test Harness

    >As far as I can tell TPC-DS is not being actively used. Others have
    discussed issue of which benchmark to use. The most

    >popular seems to be the Star Schema benchmark. See e.g.


    >http://www.mysqlperformanceblog.com/...a-bechmark-inf

    obright-infinidb-and-luciddb/, which has a

    >discussion of the merits of various benchmarks. I'm fairly sure that

    someone has created a mondrian schema for the Star Schema

    >benchmark.




    Good references, I've seen those. SSB is a snowflake model with a single
    fact table. I'd prefer to be able to test Virtual cubes. I came down on
    the side of TPC-DS because it seemed more meaty as a data model and
    there was code to support it, even if it wasn't finished or actively
    used as a TPC benchmark.



    But I'm still open to ideas, and it could turn out that the code doesn't
    really work for TPC-DS.



    >A scalable database generator is essential




    Unfortunately both SSB and TPC-DS use data generators written in C, but
    we may be able to take on porting that to Java as scope for this
    semester or a follow on project.



    >The other thing that I would like is the ability to do performance

    regression tests...



    Agreed. In addition to the points you raise, I'm also interested in the
    question of how to get a multi-user throughput test that doesn't
    degenerate into responding to all queries out of cache. Both of these
    benchmarks have some provision for parameterized queries. I would be
    curious to see some experiments on whether randomly parameterized
    queries would return consistent throughput measurements.



    --jeff



    From: mondrian-bounces (AT) pentaho (DOT) org [mailto:mondrian-bounces (AT) pentaho (DOT) org]
    On Behalf Of Julian Hyde
    Sent: Monday, August 23, 2010 2:35 PM
    To: 'Mondrian developer mailing list'
    Subject: RE: [Mondrian] Mondrian Performance Test Harness



    Jeff,



    Thanks for raising this. It's something we have needed for a long time.



    See my comments inline.



    Jeff wrote:

    I wanted to describe an idea for creating a Mondrian Performance
    Test Harness and see if there was any input from the community. This is
    more than "maybe I'll get around to this" - the company I work for has a
    relationship with the CS department of a nearby university, and I've
    arranged to use this as a senior design project for a student team this
    semester.



    Here's the concept... Because we use Mondrian in a performance
    sensitive application (is there some other kind?), I wanted to figure
    out a way to have better regression test coverage of performance and
    throughput. What I have in mind is:



    * A new test database besides FoodMart with a scalable data
    generator (able to generate different size databases).

    Agreed. A scalable database generator is essential. FoodMart is too
    small for serious performance testing, and no one wants to download a
    10GB database. Ergo, we need a generator.

    * A Mondrian schema for this database.

    * A set of MDX queries that exercise the engine.

    * A JMeter test script to send these queries as XMLA requests as
    a single or multi-threaded workload.

    I was thinking of using TPC-DS
    (http://www.tpc.org/tpcds/default.asp) as the database and data
    generator. I can't tell if this benchmark is still being worked on, but
    there's a download with a set of tools. The data model contains multiple
    fact tables, so it should be possible to exercise virtual cubes, which
    is important to me.

    As far as I can tell TPC-DS is not being actively used. Others have
    discussed issue of which benchmark to use. The most popular seems to be
    the Star Schema benchmark. See e.g.
    http://www.mysqlperformanceblog.com/...-bechmark-info
    bright-infinidb-and-luciddb/
    <http://www.mysqlperformanceblog.com/...a-bechmark-inf
    obright-infinidb-and-luciddb/> , which has a discussion of the merits of
    various benchmarks. I'm fairly sure that someone has created a mondrian
    schema for the Star Schema benchmark.



    I would like to include this performance suite in mondrian's
    distribution as an optional set of tests. That implies that you should
    provide a fairly easy way to load the data set onto any database and
    instructions for how to set up the test harness.



    The other thing that I would like is the ability to do performance
    regression tests. That is, run the same tests regularly on the source
    code, and detect when a developer makes a change that damages
    performance. Test running times contain random noise -- a test might run
    slower one day for a variety of reasons such as cosmic rays striking the
    hard disk drive -- and so the test infrastructure would need only report
    degradations when several runs of the test have been significantly
    slower. This would entail keeping a database of historic performance
    stats, and computing say the standard deviation of each number.



    A framework for performance regression testing -- say as an extension to
    junit -- could be a nice research/open source project for someone.



    Julian




    _______________________________________________
    Mondrian mailing list
    Mondrian (AT) pentaho (DOT) org
    http://lists.pentaho.org/mailman/listinfo/mondrian

  4. #4
    Nicholas Goodman Guest

    Default Re: [Mondrian] Mondrian Performance Test Harness

    On Aug 23, 2010, at 12:37 PM, <jeff.s.wright (AT) thomsonreuters (DOT) com> wrote:
    [color=blue]
    > Good references, I

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.