PDA

View Full Version : Simpler reporting : make your data richer



mattcasters
11-30-2006, 01:11 AM
A lot of time, I hear discussions about which reporting tool is the easiest to use for certain special tasks. Most of the time, I just ignore these “threads” because it’s not my cup of tea as a developer of ETL solutions.
However, it has to be said, often the solution to complex reporting requirements is to be found in ETL.
When you find yourself struggling with complex reports that need any of the following:

compare different records
aggregate beyond simple sums and averages.
report on a non-existing records (report 0 sales, etc)
…Well, in those cases you need ETL.
Let’s take for example the case of the reporting on non-existing sales: how can you report that there has been 0 sales for a certain product during a certain week? Well, you can create an aggregate table in your ETL that contains the following:

Dimensions
Time (end of week)
Product (lowest level)
Facts
Sum of Sales for the product during the passed week.You will agree with me that if you create such a simple fact table, that the report based on it will also be simple. That’s not all. You get another benefit from this: you create the possibility to extend the fact table with other interesting data when it comes up. For example, in the same transformation that populates this aggregate fact table, you can add the following (examples):

Facts
Total Sales last month
Total Sales last quarter
Total Sales last half year
Total Sales Year-to-date
Total Sales ever
Difference in sales with last week
…You see, the initial investment of building the ETL job is not thrown away over time, you build upon the earlier effort and grow with the needs.
Now you might think that these metrics cost a lot in terms of CPU and I/O to calculate, but of-course that’s not true: you can just look at the numbers for the previous week and do a simple addition…..
http://www.kettle.be/images/blog-aggregation.png
The added advantage of doing it during the night in the ETL jobs is that the speed of the reports that run on this aggregate are bound to run very fast because no calculations need to take place afther the facts.
Finally, let me say that by constructing your datawarehouse like this (one fact table derived from another) you’re actually enriching your data, making it high quality information… which brings me to this famous quote as what we’re doing here is exactly the oposite:
“Where is the wisdom? Lost in the knowledge.
Where is the knowledge? Lost in the information.”
<span style="font-weight: bold" />
T.S. Eliot<span style="font-weight: bold" />

“Where is the information? Lost in the data.
Where is the data? Lost in the #@$%?!&#038; database.”

Joe Celko
Until next time,
Matt


More... (http://www.ibridge.be/?p=14)