Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: SQL Generation and Execution Problems

  1. #1
    Join Date
    May 2007
    Posts
    5

    Default SQL Generation and Execution Problems

    Via JSP servlet, the following request is executed:
    "SELECT { [Measures].[Sales]} ON AXIS (0),
    NON EMPTY {DESCENDANTS ([Customer].[All Customers], 1, SELF) } ON AXIS (1) FROM [Sales]
    WHERE ( [Year].[2006] ) CELL PROPERTIES VALUE"

    This request takes about 2 to 3 seconds. Two SQL statements are exectued as a result:

    "select "D_Dates"."fiscal_year" as "c0" from "D_Dates" as "D_Dates" where "D_Dates"."fiscal_year" = 2006 group by "D_Dates"."fiscal_year" order by "D_Dates"."fiscal_year" ASC"
    and
    "select "D_Customer"."cust_name" as "c0" from "D_Dates" as "D_Dates", "F_Sales" as "F_Sales", "D_Site" as "D_Site", "D_Job" as "D_Job", "D_Customer" as "D_Customer" where "F_Sales"."date_id" = "D_Dates"."date_id" and "D_Dates"."fiscal_year" = 2006 and "D_Job"."cust_id" = "D_Customer"."cust_id" and "D_Site"."job_id" = "D_Job"."job_id" and "F_Sales"."site_id" = "D_Site"."site_id" group by "D_Customer"."cust_name" order by "D_Customer"."cust_name" ASC"

    So far, no complaints.

    If we execute the same request via XMLA in Tabular mode, again the request take about 2 to 3 seconds with the same two SQL statements being sent to the target database.
    BUT....
    If the same MDX request is requested in Multidimensional mode, it takes about 30 to 40 seconds to execute the first time, and the number of SQL statements hitting the database is in the thousands.
    It would appear that Mondrian is requesting data for each individual element (in this case, [All Customers] has 6201 children all on the same level) and storing it in the cache. Asking the same request of course is much faster since the results are already in the cache.
    Is there a way to prevent Mondrian from issuing the thousands of SQL statements just like in Tabular Mode?

  2. #2
    Join Date
    Nov 2006
    Posts
    3

    Default Mondrian performance issues and possible solutions

    We have noticed that performance may be an issue under certain circumstances when in XMLA mode, due to the “DISPLAY_INFO” attribute, used to get cardinality of the next level.
    We noticed in our environment, SQLServer 2000 and 2005 (with default install parameters), returns the approximate cardinality(0 OR 1000 in, depending if there are any children or not) for this kind of request, whereas Mondrian return the exact child cardinality.

    Within our test environment, the Mondrian server generated 1450 SQL requests if using the “NON EMPTY” MDX key words, and 6201 requests for all the members when not using “NON EMPTY”.

    To reduce the amount of requests, we believe retrieving the cardinality when retrieving members would improve overall performance.

    Our Approach
    We have a possible solution that has addressed our problems and possibly others with regards to XMLA performance.

    Modified Files:

    olap/mondrian.dtd
    olap/Mondrian.xml
    olap/MondrianDef.java
    rolap/sql/RolapLevel.java
    rolap/sql/RolapMember.java
    rolap/sql/SqlMemberSource.java

    To retrieve the cardinality of a member, the DISPLAYINFO axis attribute by default would be dependant on cacheMemberCardinality value(added attribute for the Level element).

    If cacheMemberCardinality="false", (default value), executes an SQL query to the server and asks if there are any children, the present behaviour of Mondrian.

    If cacheMemberCardinality="true", we retrieve the exact children cardinality information at the same time as Mondrian issues the request to get the members. The performance gain is significant.

    MDX statement used for testing:
    SELECT { [Measures].[Net Sales] } ON AXIS (0) , NON EMPTY {DESCENDANTS ([Customer].[All Customers], 1, SELF) } ON AXIS (1) FROM [Sales Cube] CELL PROPERTIES VALUE

    Original generated SQL statement, before code modification, same as cacheMemberCardinality="false":
    select "D_Job"."job_Name" as "c0" from "D_Site" as "D_Site", "D_Job" as "D_Job", "D_Customer" as "D_Customer" where "D_Job"."cust_id" = "D_Customer"."cust_id" and "D_Site"."job_id" = "D_Job"."job_id" and ("D_Customer"."cust_name" ='Customer 1128 ') group by "D_Job"."job_Name" order by "D_Job"."job_Name" ASC


    Generated SQL statement, after code modification, same as cacheMemberCardinality="true":
    select "D_Job"."job_Name" as "c0" , Count("job_id") as "c1" from "D_Site" as "D_Site", "D_Job" as "D_Job", "D_Customer" as "D_Customer" where "D_Job"."cust_id" = "D_Customer"."cust_id" and "D_Site"."job_id" = "D_Job"."job_id" and ("D_Customer"."cust_name" = 'Customer 1128 ') group by "D_Job"."job_Name" order by "D_Job"."job_Name" ASC


    The second SQL statement, with the ‘Count’, took 0.1 second more (on the database side), than the original query. A negligible overhead when compared to the gain in global performance. Perhaps making this the default behaviour would be acceptable.

    The Customer dimension has 6201 entries.
    Before optimisation
    MONDRIAN SQL SSAS2005+2000
    90 Seconds 2.5 Seconds ( without NON EMPTY )
    30 Seconds 0.7 Seconds ( With NON EMPTY )
    After optimisation
    MONDRIAN SQL SSAS2005+2000

    9 Seconds 2.5 Seconds ( without NON EMPTY )
    3.5 Seconds 0.7 Seconds ( With NON EMPTY )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.