PDA

View Full Version : Proposal: Action Output Caching



malyvelky
10-17-2008, 07:29 AM
Hello guys,
Thanks for the great work you've done so far! We're using Pentaho for a few months and now we're thinking about extending it to support caching of action outputs such as reports. I'd love to hear your opinion.

Motivation

Reports, charts and other BI outputs usually need to be refreshed only when the source data changes, which can be quite a long period (in minutes, hours, even days or months). Only in extremely rare cases we need to refresh an output on each access and therefore most BI outputs are "generate once, view many times". Since generating a BI output - executing the SQL/MDX query and processing the result set into a displayable output - is usually pretty resource intensive and takes inconsiderable time, we could greatly improve the performance perceived by users by caching those BI outputs and only refreshing them when it's really needed instead of doing this on every view. We could also handle a considerably larger number of (simultaneous) users because most of them would just retrieve an already created and stored output.

Currently Pentaho doesn't support this.

Action Output Caching Implementation

I'd start with caching the final outputs (.html report, .pdf report...) as this provides shortest response time than caching results sets and can be extended with caching result sets as well.

As I want this as transparent as possible, I'd modify the Action Sequence processing to support caching - this only requires replacement of 1 class (or not even that if we use AOP) to apply to all existing actions. I'd:


Add an additional optional sub-element <cache> to <action-sequence> that would allow for setting of expiration etc.
Read default values (default expiration time, outputs that shall not be cached,...) from some config file.
Perhaps create a new cache service exposed as an Action that would support the operations 1) retrieve content(key), 2) cache content(key, content, expiration), and optionally 3) clear(), 4) remove(key). The key is somehow composed of the source action (path,...) and its actual parameters. Based on the actual time and the content's expiration, the retrieve operation may either returne cached content or drop an expired content (if not done automatically by the cache impl.) and return nothing. Internally it uses some existing cache implementation (preferably pluggable), similarly to Hibernate (e.g. EHCache, OSCache, SwarmCache, JBoss TreeCache).
During action sequence processing, try to retrieve cached content from the cache. If it is there, return it immediately to the caller.
At the end of action sequence processing, cache the newly generated output.

codek
10-17-2008, 08:18 AM
On the face of it this does sound like a good idea. After discussion on here i'd reccomend you raise a jira.

I think this is already raised actually, i saw this issue:

http://jira.pentaho.com/browse/BISERVER-1416

Which is similar, but not quite exactly the same.

Of course, if you have any reports based on MDX, then you can utilise the existing caching engine available there.

Dan