US and Worldwide: +1 (866) 660-7555
Results 1 to 10 of 10

Thread: Getting different/wrong results in version 3.5 while correct results in 3.4

  1. #1

    Default Getting different/wrong results in version 3.5 while correct results in 3.4

    Hi all,
    I m getting correct results when i was worked with schema_workbench 3.4 and using my java code with mondrian.jar. (comapared with sql results). Now i switched to version 3.5, getting worng results and completely different results.

    MDX :
    select Measures.[Amount] on Axis(0),
    Filter (CrossJoin(
    CrossJoin(
    CrossJoin(
    CrossJoin( [Dim1].[val1], [Dim2].[val2]),
    [Dim3].[val3]),
    [Dim4].[val4]),
    [Dim5_Customers].Members),
    Measures.[Amount] > counter ) on Axis(1)
    From [Cube]
    where ([Time].[2013].[10].[1] : [Time].[2013].[10].[30])

    and also i mentioned Measure Expression in schema as dialect-"mysql" query as "case when tbl.amount > value then value else 0 end" aggreagte as 'sum'

    MDX filter condition is not evaluating exactly what i needed. After i traced the generated sql statements, it is like

    "having sum(case when tbl.amount > value then value else 0 end) > conter"

    I observed that formated sql is different from 3.4 to 3.5 (it is optimized using 'in').

    The object is to get the customers with in a time range with specified condition in mdx and base condition as measure expression in schema.

    Give me any suggesations to transform the above mdx to get acurate results.

    (I built an application it dynamicaly builds mdx with user parameters and gets executed and transform the required results to mysql tables. So for me query stucture is important.).

    Thanks in Advance.

  2. #2
    Join Date
    Jan 2013
    Posts
    512

    Default

    It sounds like you've hit a bug. If possible it would be great if you could reproduce using the sample foodmart data, and log a Jira case.

    Couple of things worth trying to try to narrow down the issue:
    1) Disable the *.native settings in mondrian.properties (set mondrian.native.crossjoin.enable=false and mondrian.native.nonempty.enable=false). Rerun your MDX and see if you see the same results. The query in your example uses a compound slicer (it's got a set of time members), and the way this gets handled is a little different with native evaluation.
    2) Instead of using a compound slicer, define a calculated member which aggregates the time range. E.g.

    with member Time.aggMember as 'Aggregate([Time].[2013].[10].[1] : [Time].[2013].[10].[30])'
    select Measures.[Amount] on Axis(0),
    Filter (CrossJoin(
    CrossJoin(
    CrossJoin(
    CrossJoin( [Dim1].[val1], [Dim2].[val2]),
    [Dim3].[val3]),
    [Dim4].[val4]),
    [Dim5_Customers].Members),
    Measures.[Amount] > counter ) on Axis(1)
    From [Cube]
    where Time.aggMember



  3. #3

    Default

    Hi, Thanks for the reply.

    Here is my mondrian.properties file options

    mondrian.foodmart.jdbcURL=jdbc:mysql://localhost:3306/StarTest?user=user&password=passwd
    mondrian.jdbcDrivers=com.mysql.jdbc.Driver
    driver.classpath=/path/to/mysql-connector-java-5.1.16.jar


    mondrian.rolap.maxConstraints=10000
    mondrian.rolap.aggregates.Use=true
    mondrian.rolap.aggregates.Read=true
    mondrian.rolap.generate.formatted.sql=true
    mondrian.rolap.nonempty=true
    mondrian.rolap.ignoreInvalidMembers=false
    mondrian.rolap.iterationLimit=0


    The same proporties file is used by both schemaworkbench 3.4, 3.5 aslo set -cp to my java application.
    System configuration:
    Os: Ubuntu 12.04
    RAM: 4 gb
    DataSet: 25 Lakh transactions in fact table

    I tried with your MDX as "with member time.agg ....", got exact results but taking so long time to display (its about 15 mins). But the worng results are getting in fraction of sec (say 500 mSec).
    How can i optimize this and it would be very greatful if you know any good reference for MDX to know in how many way we can do optimize and change the query structure.

    Thnaks & Regards
    LinuxBee

  4. #4
    Join Date
    Jan 2013
    Posts
    512

    Default

    Regarding MDX references, Spofford's MDX book is one that I've used. Your query has a pretty deeply nested CJ, and attempting to roll up a set of 30 members at each intersection can be expensive. In this particular case it would be much cheaper to compute the measure as
    (measures.[amount],[Time].[2013].[10]) - (measures.[amount],[Time].[2013].[10].[31])
    but that's not flexible if you're changing your time ranges.

    Did your initial query with the compound slicer return in a reasonable amount of time in 3.4? I'd like to understand what changed. Can you enable mondrian.mdx and mondrian.sql DEBUG logging in log4j.xml, run your tests in 3.5, and attach the output? Also, it could be worth trying against 3.6 as well.

  5. #5

    Default

    Hi, Here I am attaching the sql log files for both versions (3.4 and 3.5) on same schema and mdx.
    I have schema with degenrated dimension as fact_auto_incr_id (transaction_id), which has 25 lakh records and 2 dimension tables customer_id and account_id with 1.5 lakh records each.
    I m trying to get the customer_id, account_id and respective transaction_id's, by passing the rest of 6 dimension members as filter parameters. (Like [Dim1].[member1], [Dim1].[member3] .. ).

    While coming to the results ..
    with 3.5 got results in less than a sec (see execution time in file - 300 ms) -> results are wrong (observed that it is not considering the month - it is summing all the months where date in (16 - 30) )
    with 3.4 got in (6 mins.. for the first time, it was taken 8 mins and just re-executed it again then it comes to 6 mins) -> results are correct (should not get any customer_id for the filster value >500000)

    Can you suggest any optimisation for this. I tried to use "From slicing" (mentioned some where, as nested select mdx) like

    SELECT
    [Measures].Members ON 0,
    [Geography].[Geo].Members ON 1
    FROM
    (SELECT [Time].[Calendar].[Year].[2010] ON 0 FROM [Sales])

    but didn't work.

    Thanks & Regards
    LinuxBee
    Attached Files Attached Files

  6. #6
    Join Date
    Jan 2013
    Posts
    512

    Default

    In your 3.4 log file I see a large number of segment load queries, each for a different subset of the total data. Can you try setting mondrian.rolap.cellBatchSize to a very large number (say 100 million) and see what impact this has on 3.4? You can run a similar test on 3.5 with the aggregate member MDX I had suggested earlier.

    Apart from performance, the wrong numbers with 3.5 bother me more. If you could include the mdx logs as well for the 3.5 test that may help shed some light. Also, if you're able to test against 3.6, it would be good to know if this issue persists.

  7. #7

    Default

    Hi Matt,

    I test on 3.5 with the modified query and system memory configurations. Here I am attaching the results.

    System configuration:
    In Eclipse: Run Configuration as:
    -Xms1024m -Xmx2048m -XX:MaxPermSize=1024m (the last heap memory arg effects the total execution time).


    Mondrian Properties:

    Apart form the connection string and driver class path

    mondrian.rolap.aggregates.Use=true
    mondrian.rolap.aggregates.Read=true
    mondrian.rolap.ignoreInvalidMembers=false
    mondrian.rolap.iterationLimit=0
    mondrian.rolap.cellBatchSize=10000000
    mondrian.rolap.generate.formatted.sql=true




    log4j properties:

    Nothing change in this, just copied as it is from the modrian configuration page.


    Result time:

    for the first query: 80-120 sec

    next on-words it is taking half of the time i.e. 40-60sec


    Query format:

    with member [Time].[aggRange] as
    'aggregate([Time].[2013].[9].[16] : [Time].[2013].[9].[30]), Measures.[MaxTranAmt_50000])'"
    Member [Dim1].[Cust] As
    'aggregate({[Dim1].[1], [Dim1].[2]}), Measures.[MaxTranAmt_50000])'
    Select {Measures.[MaxTranAmt_50000]} on Axis(0),
    Non Empty Filter( [Customer Id].[Customer Id].Members, Measures.[MaxTranAmt_50000] > 200000 ) on Axis(1)
    From [CAS]
    Where ([Dim2].[C], Dim3].[1], [Dim4].[1], [Dim1].[Cust], [Time].[aggRange])




    Results are fine and correct. What I wanted to get from results is, customers and respective amounts or counts(the no of transactions that made the sum - How can i get this count). Thats why I fixed the dimensions that to get this values. Is this the efficient way to get or any other way. Still i want to reduce this time complexity, because, as I told you earlier the wrong results are getting in 98 mSec (with version 3.5).

    With small dataset (say 2 Lakh records) with 3.4 version is also pretty fast, like say in 300 mSec getting correct results.

    How much heap/cache memory it needs relatively with the data set size say 25 Lakh records (only fact table - 900mb) and 2 dimensions with (1.5 lakh records each) and other 6 dimension with negligible records say (10, 100, 500, .. < 1000).

    Yet to test on 3.6.

    Sorry for the late response and Thanks for the hint for mondrian properties(got worked).

    Thanks & Regards
    LinuxBee
    Attached Files Attached Files

  8. #8

    Default Tested on 3.6

    Hi mcampbell, Here I am attaching the test results on 3.6.
    This time, I reduced the dataset size and increased the no of slicer dimensions (aggregating with not in ).

    Query:

    with member [Time].[aggRange] as
    'aggregate(([Time].[2013].[9].[1] : [Time].[2013].[10].[31]), Measures.[MinTranAmt_200000])'
    Member [Dim 1].[Cust] As
    'aggregate({[Dim 1].[1], [Dim 1].[4], [Dim 1].[2], [Dim 1].[7]}, Measures.[MinTranAmt_200000])'
    Member [Dim 2].[Custm] As
    'aggregate({[Dim 2].[2], [Dim 2].[5], [Dim 2].[6], [Dim 2].[7]}, Measures.[MinTranAmt_200000])'
    Member [Dim 3].[Types] As
    'aggregate( Except ( {[Dim 3].Children}, " // [Dim 3].CurrentMember NOT IN
    {[Dim 3].[64], [Dim 3].[107], [Dim 3].[108], [Dim 3].[109], [Dim 3].[112],
    [Dim 3].[113], [Dim 3].[114]}), Measures.[MinTranAmt_200000])'
    Select {Measures.[MinTranAmt_200000]} on Axis(0),
    Non Empty Filter( CrossJoin( "
    {[Customer Id].[77698], [Customer Id].[133432]}, [Transaction Id].[Transaction Id].Members),Measures.[MinTranAmt_200000_Count] > 0 ) on Axis(1)
    From [CAS]
    Where ([Dim 4].[1], [Dim 5].[1], [Dim 2].[Custm], [Dim 1].[Cust], [Dim 3].[Types], [Time].[aggRange])

    Data Set:
    fact_table = ~3,00,000 transactions
    Dimension_tables:
    Dim 1:12, Dim 2: 8, Dim 3: 697 and
    Dim 4 & Dim 5 are degenerated dimensions with 2,3 distinct vals in fact_table
    transaction_id and Customer_id are also degenerated dim's with transaction_id as PK and customer_id's are nearly 2,00,000

    If i remove any aggregated member, it is doing well, (but takes lot time 50 sec even on 8G RAM, and data set is also not so large).

    In log file (attached here), query has only customer id on axis 1.

    The attached log query with same data set, getting results with version 3.6, but with 3.5 its just hangged up(didnt get any results for long time ).

    Expecting: In my case, I have include atleast 5 dimension members (some with NOT IN as shown above) in MDX and max 2 measures. Any suggestion to build mdx in a optimized way. Now I m trying to optimize database based on query specific.

    Thanks & Regards
    Linux Bee
    Attached Files Attached Files

  9. #9

    Default Unexpected Results with verion 3.6 ( using except and Not-In functions )

    Hi mcampbell, Sorry for the late reply (test result on version 3.6, I was away from the project).

    For the MDX with except/ not-in functions (which are applied on a dimension with 700 members) giving wrong results.

    Getting Unexpected results because of "Filter" is applying before "Where slicing " and also the date range.

    If the date range is between '2013-09-15' and '2013-10-15', it is looking for 2 month dates in generated sql, it is applying in "where" clause with other dimension members except one, which has ~700 members, which is aggregating with except().

    where
    :
    :
    and
    `tbl_time`.`year` = 2013
    and
    `tbl_time`.`month` in (9, 10)
    and
    `tbl_time`.`day` in (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
    :
    :
    group by
    :

    MDX:
    with member [Time].[aggRange] as 'Aggregate(([Time].[2013].[9].[15] : [Time].[2013].[10].[15]))'
    member [Dim 1].[Cust] as 'Aggregate({[Dim 1].[1], [Dim 1].[4], [Dim 1].[2], [Dim 1].[7]})'
    member [Dim 2].[Custm] as 'Aggregate({[Dim 2].[2], [Dim 2].[5], [Dim 2].[6], [Dim 2].[7]})'
    member [Dim 3].[Types] as 'Aggregate(Except({[Dim 3].Children},
    {[Dim 3].[64], [Dim 3].[107], [Dim 3].[108], [Dim 3].[109], [Dim 3].[112], [Dim 3].[113], [Dim 3].[114]}))'
    select {[Measures].[MinTranAmt_200000], [Measures].[MinTranAmt_200000_Count]} ON COLUMNS,
    Filter([Customer Id].[Customer Id].Members, ([Measures].[MinTranAmt_200000_Count] > 5)) ON ROWS
    from [CAS]
    where ([Dim 4].[1], [Dim 5].[1], [Dim 2].[Custm], [Dim 1].[Cust], [Dim 3].[Types], [Time].[aggRange])

    Result:
    Axis #0:
    {[Dim 4].[1], [Dim 5].[1], [Dim 2].[Custm], [Dim 1].[Cust], [Dim 3].[Types], [Time].[aggRange]}
    Axis #1:
    {[Measures].[MinTranAmt_200000]}
    {[Measures].[MinTranAmt_200000_Count]}
    Axis #2:
    {[Customer Id].[77698]}
    {[Customer Id].[133432]}
    Row #0: 1,150,000
    Row #0: 3
    Row #1: 204,410
    Row #1: 1

    ==> in filter() it is mentioned as > 5. Here the counts are 3 and 1, which should not be displayed

    Except the [Dim5], all other are applied at this level (in the last but one sql query)

    select
    `tbl_sub_transactions`.`customer_id` as `c0`
    from
    `tbl_sub_transactions` as `tbl_sub_transactions`,
    `tbl_time` as `tbl_time`,
    `tbl_dim1` as `tbl_dim1`,
    `tbl_dim2` as `tbl_dim2`,
    `tbl_customer` as `tbl_customer`
    where
    `tbl_sub_transactions`.`dim4` = '1'
    and
    `tbl_sub_transactions`.`dim5` = 1
    and
    `tbl_sub_transactions`.`date_id` = `tbl_time`.`date_id`
    and
    `tbl_time`.`year` = 2013
    and
    `tbl_time`.`month` in (9, 10)
    and
    `tbl_time`.`day` in (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
    and
    `tbl_sub_transactions`.`dim1` = `tbl_dim1`.`dim1_id`
    and
    `tbl_dim1`.`dim1_id` in (1, 4, 2, 7)
    and
    `tbl_sub_transactions`.`customer_id` = `tbl_customer`.`customer_id`
    and
    `tbl_customer`.`dim2` = `tbl_dim2`.`dim2`
    and
    `tbl_dim2`.`dim2` in (2, 5, 6, 7)
    group by
    `tbl_sub_transactions`.`customer_id`
    having
    (sum((case when `tbl_sub_transactions`.`base_trans_amt` >200000 then 1 else 0 end)) > 5)
    order by
    ISNULL(`tbl_sub_transactions`.`customer_id`) ASC, `tbl_sub_transactions`.`customer_id` ASC

    Here the sum()> 5 is passed, because it is applying on two months (day in (1- 30) and month in (9,10)).

    After this query it is applying the Dim 3 and showing the result.

    After all, the final result is correct if we consider the exact date between '2013-09-15' and '2013-10-15' (verified with native sql queries).


    -- > The same was happened with Not-in function

    MDX:
    with member [Time].[aggRange] as 'Aggregate(([Time].[2013].[9].[1] : [Time].[2013].[10].[31]))'
    member [Dim 1].[Cust] as 'Aggregate({[Dim 1].[1], [Dim 1].[4], [Dim 1].[2], [Dim 1].[7]})'
    member [Dim 3].[Custm] as 'Aggregate({[Dim 3].[2], [Dim 3].[5], [Dim 3].[6], [Dim 3].[7]})'
    member [Dim 4].[Types] as 'Aggregate(Filter([Dim 4].Children, (NOT ([Dim 4].CurrentMember IN {[Dim 4].[64], [Dim 4].[107], [Dim 4].[108], [Dim 4].[109], [Dim 4].[112], [Dim 4].[113], [Dim 4].[114]}))))'
    select {[Measures].[MinTranAmt_200000]} ON COLUMNS,
    Filter([Customer Id].[Customer Id].Members, ([Measures].[MinTranAmt_200000_Count] > 5)) ON ROWS
    from [CAS]
    where ([Dim 2].[1], [Dim 5].[1], [Dim 3].[Custm], [Dim 1].[Cust], [Dim 4].[Types], [Time].[aggRange])

    Result: same as above.

    In my view, It is counting all the member in date range 2013-9-1 to - 2013-10-30 which are > 5, after applying the filter condition, it is applying to exclude [Dim 4] members.

    Qns: How can I exclude some members of a dimension (which has ~700 members).
    Attached Files Attached Files
    Last edited by LinuxBee; 04-07-2014 at 07:24 AM.

  10. #10
    Join Date
    Jan 2013
    Posts
    512

    Default

    I believe I've identified a bug related to what you're seeing (MONDRIAN-1999). Can you try setting the native filter property to false to see whether you get expected results? I.e. put this in your mondrian.properties:

    mondrian.native.filter.enable=false


    Thanks for your persistence in testing this!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •