View Full Version : Bursting in Pentaho

02-09-2006, 06:25 AM
I have had a look at the action Sequence bursting example. It seems to me that this is not bursting at all. It looks like the same report is run as many times as needs be against the database and the result of each iteration is sent out. Have I miss undertood the action sequence file ?

Isn't the point of bursting that you hit the database just once and then break the result into chunks based on parameters ?


02-09-2006, 11:45 AM

The term burst comes from the days of line printers that used fan folded continuous-feed paper. A giant report would be generated and the sections (representing some level of grouping) were manually torn apart and delivered to the appropriate person. This act of this manual separation was called 'bursting' meaning 'to tear apart' and was usually accomplished by flicking the middle finger at the perforation and pulling the pages apart.

So yes - strictly speaking you are correct – the term bursting was coined to describe breaking one report into many. The problem with this simple approach is you can’t tear a PDF, XLS or even HTML into chunks very nicely.

The closest way to simulate this is – make one query, iterate through the result-set one time and call the report engine for each chunk of result. After each chunk, get the list of recipients for that section and email them. In some scenarios this process may be useful and it is possible to build an Action Sequence to do a burst this way. We did not include a sample of this with the demo because it provides very little value over the paper method. It just saves your burst finger and a bunch of walking.

The down side of this approach is - if someone were responsible for several regions or departments, they would receive multiple emails containing one section of the report and have to aggregate any data between them by hand.

The process could be improved by dynamically generating the query and letting a “where??? or ???in??? clause filter the data returned to just the sections that will be burst, generate a report for each section, store that report somewhere and create an email for each recipient containing only the sections they are interested in. This is a little better because we only return the data we need, iterate through the data one time, and send one email to each recipient. The down side is - the recipient still has to deal with multiple report attachments and no totals between them. Also the report sections need to be temporarily stored somewhere and managed. It is possible to create an Action Sequence to satisfy this scenario.

If the requirement is to generate a custom report for each recipient, but still only make one database call, then either the report engine has to read in the entire result set, store it internally, provide its own filtering and generate the reports. Even if this were a good idea, none of the report engines supported by Pentaho can do this. The other option is to stage the data somewhere else and hand the appropriate chunks to the report engine. It is possible to create an Action Sequence that stages the data internally while calling the report generator. The down side is you have copied all your data from the database server to the Pentaho server.

In our experience, good enterprise database servers like mySQL or Oracle with the right indexes, smart caching and good database drivers are much more efficient at doing the filtering and data manipulation then we are within the BI platform or within the reporting engine. We believe that hitting the server more times to get smaller result sets is actually more efficient then doing one big query and manipulating the data ourselves and it certainly scales much better. I admit that the data we chose to use in the bursting example doesn’t show this very well since each manager only has one region and we assume they get all departments. I’ll make sure better bursting examples will be available in the next release.

I know this was very long winded, but I wanted to make sure to point out two things. We see bursting as delivering custom reports to the users that want them, in a format they need when the data becomes available. Secondly, Action Sequences are meant to be very flexible and support the business need. The samples are simply one possible way to do things but definitely not the only way. By the way – we have, over the years, fallen into each of the traps mentioned above.


02-09-2006, 12:18 PM
Thanks very much for your detailed response. I especially appreciate the history, stuff like that is priceless so thanks.

My requirement is to provide the single database call type bursting for two reasons:-
1. It is how one of the mega-bucks competitiors provides bursting (and that matters to my boss(why?))
2. Our database server for our transaction data is very, very heavily worked. There is a 10 hour period overnight that has to provide for an awful lot of reporting, loading and what else. That 10 hour period is so valuable that one hit is all we get.

We could do with more power at the database. We could do with better structured data but I don't have the power to influence that now and we have a lot of raw data (we do 3 billion ATM transactions a year). I did a short experiement today. We have a bunch of JasperReports that run. They are parameterised and a query is run to get a list of acquirers. A report is then run for each acquirer. There are about 40 acquirers. If I run one query to get all the data for all acquirers the execution time on the database is only a 10th of what it is for the many queries.

I can afford the computational time on the BI platform but not on the database. My plan is to use JasperReport (or maybe JFreereport) to create an xml output report then use xpath/xslt to burst it the middle finger way.

I'll let you know how I get on. If I create an Action Sequence to do it I'll let you know about that too.

Again many thanks. I do think the Penatho platform is a great acheivement and I expect it to compete with some very expensive competition. I hope so as I am backing Open Source in my company and I need to make it work.