PDA

View Full Version : Aggregation rule/Type and Formulas



DEinspanjer
12-06-2007, 10:46 AM
If I want to include the concept of a fact table measure that is the distinct count of a foreign key column, what exactly is the correct approach in the PME?

I've been reading documentation in the wiki, but I haven't found anything that really describes what the Aggregation rule / Aggregation Type property does or how to use it.

This wiki page (http://wiki.pentaho.org/display/studio/02.+Pentaho+Metadata+Formulas) indicates that there are aggregate functions supported in the Formula field including COUNT but it doesn't mention how that is supposed to relate to the Aggregation Type property. Also, as far as the COUNT function goes, would it work like standard SQL and be something like COUNT(DISTINCT [fk_column]) ?

MattCasters
12-06-2007, 01:02 PM
Suppose you change the formula to "COUNT(DISTINCT [fk_column]))"

Since it's a distinct count, the aggregation type should be set to "None".
The Field type should be a Fact, not a dimension.
Since you don't want to have the expression changed by the SQL generator, you check the "Is the formula exact?" option.
I think that ought to do it.

DEinspanjer
12-06-2007, 01:16 PM
Could you tell me anything about why putting a count distinct SQL clause in the formula and setting the aggregation type to none is correct in this case and when you might want to use the "distinct count" aggregation type instead?

I noticed that in order for the mondrian schema generation to work correctly, I had to do the latter, but putting a formula into the field caused the mondrian schema to not work.

MattCasters
12-06-2007, 03:22 PM
Hi Daniel,

I just wanted to explain to you how the SQL generation worked. If the count distinct aggregation works fine for you, then all the better ;-)
I didn't know you were working with the Mondrian Schema generator.

Setting the aggregation type to none is required since in general you can not add distinctive counts safely, (barring a few exceptions, as always)

I'm unfortunately no Mondrian genius. I'm scheduled to meet and brainstorm with Julian at the end of January about these things. If you feel like it, file bug reports against the schema generations and I'll try to take these issues with me or at least have them looked over by then.

The schema generator was written in one memorable (jet lagged) night in Orlando. The result was sent to Julian and he then advised me to put specific defaults on options like "Has all" (enabled in the top-level hierarchy). That was basically it.
That being said, despite being pretty basic, I really believe that model based schema generation is the way forward. I personally think it just takes to long to create a schema using the traditional tool set. (Cube designer for example) This also has the advantage that it can be done dynamically over time, over web interfaces and the like.

All the best,

Matt

DEinspanjer
12-06-2007, 03:46 PM
Having struggled with the Cube Designer for a while and then having tried out the schema generator, I very much prefer the latter. I think it should quickly become an official feature of the MDE.

I'll be happy to log some bugs against it for the obvious things I've run into so far (I didn't want to do that before determining if you wanted to see bugs on an easter egg. :)

I'm still left rather confused about what exactly the Aggregation rule / Aggregation Type property is used for in the MDE. Maybe that is just because I haven't used the resulting xpi in the ad-hoc report generator yet. If you care to say any more about it in general, I'd even be happy to take that information and update the wiki page with it.

MattCasters
12-06-2007, 04:58 PM
Well, suppose you have a column called "turnover" and the aggregation rule is "Sum" and the field type is "Fact".
In that case, the SQL generator will determine that in order to get the correct result, it needs to generate a "SUM(turnover)" expression.
It also knows that it has to generate a GROUP BY statement since it's an aggregation.
It groups over all the dimension columns for example.

There are a few rules like that in the SQL generator.

All the best,

Matt