US and Worldwide: +1 (866) 660-7555
Results 1 to 9 of 9

Thread: Several questions about Pentaho Mondrian

  1. #1
    Join Date
    May 2013
    Posts
    3

    Default Several questions about Pentaho Mondrian

    I'm new and trying out Pentaho Mondrian for my company's proof of concept work.
    We are dealing with very large scale of data (billions of rows), running on Greenplum.
    However, I've met a couple of roadblocks and would appreciate if the questions below could be answered?

    1. Is there any tool out there that could assist in transforming our current DB schema to Mondrian's OLAP schema?
    2. Given the example MDX query below:
      Code:
      SELECT
          {[Measures].[Visits]} ON COLUMNS,
          NON EMPTY {
              HEAD(
                  DESCENDANTS(
                      [Location],
                      [Location].[Segment Name],
                      AFTER
                  ),
                  3
              )
          } ON ROWS
      FROM [Location Metrics]
      WHERE (
          [Location].[Segment Name].[Bla Bla Bla],
          [Location].[Period].[Weekly],
          [Location].[Location Type].[Website],
          [Location].[Industry Name].[Ferrari Owners]
      )
      In the log files of Tomcat, the generated SQL query does not have the WHERE clause nor the LIMIT 3. So my question is, does these filtering and limiting/offset happen in DB or in OLAP server layer? If its the latter, what about tables with billions of rows?

  2. #2
    Join Date
    Jan 2013
    Posts
    521

    Default

    For (2), I'm assuming you mean you have billions of rows in your fact table(s), right? In the example you have above, the HEAD function is evaluated in Mondrian. MDX functions like HEAD act on sets of dimension members, which usually have a more manageable size. You'll typically see SQL queries for just the dimension members fired first, followed by one or more queries joining the fact table to the dimension tables to aggregate the measures. The fact table queries should not be unconstrained in the same way.

    All of that said, Mondrian *does* attempt to push some MDX function evaluation down to the DB where possible. Take a look at mondrian.native.* properties in the config guide for more details (http://mondrian.pentaho.com/document...figuration.php)

  3. #3
    Join Date
    May 2013
    Posts
    3

    Default

    Quote Originally Posted by mcampbell View Post
    For (2), I'm assuming you mean you have billions of rows in your fact table(s), right? In the example you have above, the HEAD function is evaluated in Mondrian. MDX functions like HEAD act on sets of dimension members, which usually have a more manageable size. You'll typically see SQL queries for just the dimension members fired first, followed by one or more queries joining the fact table to the dimension tables to aggregate the measures. The fact table queries should not be unconstrained in the same way.

    All of that said, Mondrian *does* attempt to push some MDX function evaluation down to the DB where possible. Take a look at mondrian.native.* properties in the config guide for more details (http://mondrian.pentaho.com/document...figuration.php)
    Thanks for your reply! You mentioned that the HEAD function is evaluated in Mondrian, then what about TOPCOUNT, SUBSET, etc for limiting results? Are all of them being executed in Mondrian layer and not in DB?

    What if our dataset is too huge and we don't have that sheer size of memory to accommodate, are there any ways we can set a limit on the results returned?

    I've looked at the documentation link you sent, the closest I can find is mondrian­.­native­.­NativizeMaxResults but we don't wanna set this for all our queries though. Besides, this will throw LimitExceededDuringCrossjoin error and might probably halt the execution.

    Appreciate your reply and help.

  4. #4
    Join Date
    Jan 2013
    Posts
    521

    Default

    I was actually thinking of properties like mondrian.native.crossjoin.enable and mondrian.native.topcount.enable. These tell Mondrian (where possible) to push extra filtering down to the database when evaluating these functions.

    Can I safely assume that the bulk of your data is in the fact table(s)? If so, a good thing to look at is aggregate tables. Well designed aggregate tables based on your expected usage can reduce the number of times Mondrian needs to go back to those billion row tables.

    For functions like SUBSET/HEAD, keep in mind that they are acting on dimensional data. Your dimension tables are hopefully much smaller than your fact tables. The mondrian.native properties become particularly important if you have a large number of dimensions with 10s or 100s of thousands of members, or users who like to create deeply nested reports. You should definitely leave them on (they're on by default), but often the performance focus of very large warehouses is on improving fact query time by designing effective aggregates.

  5. #5
    Join Date
    May 2013
    Posts
    3

    Default

    I've checked my mondrian.properties file and these properties are already set to TRUE, but from the SQL debug log, I don't see these generated SQL queries which led me to think that they are evaludated in Mondrian layer instead of DB. Please correct me if I'm wrong?

    Yes our dimensional data are much smaller than fact tables however still this will require a huge amount of memory to run Mondrian I suppose. We've been hitting the "java.lang.OutOfMemoryError: GC overhead limit exceeded" often. Is this limit controlled by Mondrian or Tomcat server? If yes, where would we configure these?

    Thanks again, really appreciate you taking your time to help me better understand Mondrian.

  6. #6

    Default

    Quote Originally Posted by ktleow View Post
    We've been hitting the "java.lang.OutOfMemoryError: GC overhead limit exceeded" often. Is this limit controlled by Mondrian or Tomcat server? If yes, where would we configure these?
    That is JVM layer, so find and adjust the -Xmx parameter of Java Machine. You can set this via JAVA_OPTS env var, for example.

    Also consider and check -XX:-UseGCOverheadLimit parameter.
    Also keep in mind that on 32-bit OS Java can take no more than 2 or 4 GB (particular value depends on OS), so setting -Xmx8g will not do the trick even if you have such amount of RAM
    physically.

  7. #7
    Join Date
    Dec 2009
    Posts
    609

    Default

    Quote Originally Posted by pled76 View Post
    Also keep in mind that on 32-bit OS Java can take no more than 2 or 4 GB (particular value depends on OS), so setting -Xmx8g will not do the trick even if you have such amount of RAM
    physically.
    I wonder if someone would even try to put 8 GByte of RAM into a 32-bit machine

  8. #8

    Default

    Quote Originally Posted by TomS View Post
    I wonder if someone would even try to put 8 GByte of RAM into a 32-bit machine
    You may do this with PAE: http://en.wikipedia.org/wiki/Physical_Address_Extension

  9. #9
    Join Date
    Jan 2013
    Posts
    521

    Default

    As pled76 mentioned, you can bump up the heap with -Xmx. I think it's worth bumping that up some and doing some profiling to see whether you can get decent performance with enough heap space.

    Another thing that could be worth looking at is mondrian­.­native­.­ExpandNonNative. It sounds like the queries you are running are too complex for default native evaluation. ExpandNonNative will try to natively evaluate sub-expressions within the sets on the axes. This increases the number of cases where sets can be simplified, but does so at the cost of additional fact table queries. Given the size of your fact tables this may be too costly, but it's worth trying.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •