PDA

View Full Version : [Mondrian] Mondrian 2.3 and nulls



Reidmiller, Chuck
03-30-2007, 04:51 PM
I posted the following message to the Mondrian community forum a week
ago:



========================================================================
==============================================

Greetings,

We have a test database that includes some tables that have null data.
We ran some MDX queries against this database using Mondrian 2.3 RC1 and
noticed that about half of the queries produced different results than
we received with 2.2. I have not yet been able to create a standard
Mondrian test case for FoodMart that reproduces the problem, but it can
be demonstrated via these steps:

1) run these 2 INSERTS to add a row with a null promotion_name and a
fact table row that references this promotion row:

insert into promotion values(7777, 120, NULL, 'TV', 77, '1996-01-10',
'1996-01-12');
insert into sales_fact_1997 values(22, 395, 78, 7777, 11, 6.87, 3.1, 3);

2) run this simple query:

select {[Store].[All Stores].[USA].Children} ON COLUMNS,
NON EMPTY {[Promotions].Children} ON ROWS
from [Sales]

The result on 2.2 shows a value of 3 for the null promotion in the
Oregon stores, which I believe is correct. The result on 2.3 shows a
different value for Oregon, and also a value for California, though
there is no data applicable to California.

Any help with tracking down this issue and producing a standard
regression test would be appreciated.

Thanks, Chuck



Since that posting, I have been able to work on this issue and believe
that I have found the problem. The code in Aggregation.loadKeys() has
been changed for 2.3 to this:



int loadKeys(SortedSet<Comparable<?>> valueSet, boolean hasNull)
{

this.hasNull = hasNull;

int size = valueSet.size();

this.keys = valueSet.toArray(new Comparable<?>[size]);

for (int i = 0; i < size; i++) {

mapKeyToOffset.put(keys[i], i);

}

if (hasNull) {

mapKeyToOffset.put(RolapUtil.sqlNullValue, size);

++size;

}

return size;

}



This code sets the keys array from the incoming valueSet, which is then
used to populate the mapKeyToOffset hash map. In the event of a null
value, the hash map is subsequently updated and gets out of sync with
the keys array, which is not updated with the null. The Segment.load()
method gets an incorrect value when it calls axes[j].getKeys().length
and duplicate key values are then used for denseDatasets (or
sparseDatasets), leading to invalid query results. Here is a modified
version of the method:



int loadKeys(SortedSet<Comparable<?>> valueSet, boolean hasNull)
{

this.hasNull = hasNull;

int size = valueSet.size();

if (hasNull) {

valueSet.add(RolapUtil.sqlNullValue);

++size;

}

this.keys = valueSet.toArray(new Comparable<?>[size]);

for (int i = 0; i < size; i++) {

mapKeyToOffset.put(keys[i], i);

}

return size;

}



and here is the test case that I used:



public void testNullPromotionName() {

final String cubeName = "Sales_inline";

TestContext testContext = TestContext.create(

null,

"<Cube name=\"" + cubeName + "\">\n" +

" <Table name=\"sales_fact_1997\"/>\n" +

" <DimensionUsage name=\"Time\" source=\"Time\"
foreignKey=\"time_id\"/>\n" +

" <DimensionUsage name=\"Store\" source=\"Store\"
foreignKey=\"store_id\"/>\n" +

" <Dimension name=\"Alternative Promotion\"
foreignKey=\"promotion_id\">\n" +

" <Hierarchy hasAll=\"true\"
primaryKey=\"promotion_id\">\n" +

" <InlineTable alias=\"alt_promotion\">\n" +

" <ColumnDefs>\n" +

" <ColumnDef name=\"promotion_id\"
type=\"Numeric\"/>\n" +

" <ColumnDef name=\"promotion_district_id\"
type=\"Numeric\"/>\n" +

" <ColumnDef name=\"promotion_name\"
type=\"String\"/>\n" +

" <ColumnDef name=\"media_type\"
type=\"String\"/>\n" +

" <ColumnDef name=\"cost\"
type=\"Numeric\"/>\n" +

" </ColumnDefs>\n" +

" <Rows>\n" +

" <Row>\n" +

" <Value column=\"promotion_id\">0</Value>\n"
+

" <Value
column=\"promotion_district_id\">120</Value>\n" +

" <Value column=\"media_type\">TV</Value>\n"
+

" <Value column=\"cost\">3</Value>\n" +

" </Row>\n" +

" <Row>\n" +

" <Value column=\"promotion_id\">1</Value>\n"
+

" <Value
column=\"promotion_district_id\">120</Value>\n" +

" <Value
column=\"promotion_name\">Promo1</Value>\n" +

" <Value
column=\"media_type\">Radio</Value>\n" +

" <Value column=\"cost\">5</Value>\n" +

" </Row>\n" +

" </Rows>\n" +

" </InlineTable>\n" +

" <Level name=\"Alternative Promotion\"
column=\"promotion_name\" uniqueMembers=\"false\"/> \n" +

" </Hierarchy>\n" +

" </Dimension>\n" +

" <Measure name=\"Unit Sales\" column=\"unit_sales\"
aggregator=\"sum\"\n" +

" formatString=\"Standard\" visible=\"false\"/>\n"
+

" <Measure name=\"Store Sales\" column=\"store_sales\"
aggregator=\"sum\"\n" +

" formatString=\"#,###.00\"/>\n" +

"</Cube>",

null, null, null);



testContext.assertQueryReturns(

"select {[Store].[All Stores].[USA].Children} ON COLUMNS,\n"
+

" NON EMPTY {[Alternative Promotion].Children} ON
ROWS\n" +

"from [" + cubeName + "] ",

fold("Axis #0:\n" +

"{}\n" +

"Axis #1:\n" +

"{[Store].[All Stores].[USA].[CA]}\n" +

"{[Store].[All Stores].[USA].[OR]}\n" +

"{[Store].[All Stores].[USA].[WA]}\n" +

"Axis #2:\n" +

"{[Alternative Promotion].[All Alternative
Promotions].[#null]}\n" +

"Row #0: 51,471\n" +

"Row #0: 52,469\n" +

"Row #0: 91,508\n"));

}



I stuck the test case in CompatibilityTest for convenience since I
modeled it on the other null tests in that class, but it may be more
appropriate elsewhere. Could Julian or someone else in the developer
community review this work and include it if it passes muster?



Regards, Chuck


_______________________________________________
Mondrian mailing list
Mondrian (AT) pentaho (DOT) org
http://lists.pentaho.org/mailman/listinfo/mondrian