PDA

View Full Version : Guidelines for cache sizes.



kmuglurmath
08-07-2003, 05:12 PM
In mondrian.properties, there is a mondrian.rolap.CachePool.costLimit variable. How is this variable used? Are there any guidelines for setting the value for this? For a fact table that has over 5 million rows and 10 different dimensions in a star schema, what is the 'rule of thumb' I need to follow to set the cache size, so that multiple users hitting the same page can get results instantly without having to re-query the data from the database?
Please help!
Thanks.

jhyde
08-08-2003, 06:09 AM
In short, you need to come up with a set of queries, and set Mondrian's cache size such that once the cache is populated, you can re-run these queries without running any expensive SQL. But the cache must not be so large that Mondrian exceeds physical memory.
The tuning process is as follows. First, start up Mondrian with tracing enabled. Now come up with a set of representative queries (maybe open several web browsers, each running a different query). The key word here is 'representative'; if you execute a query which has a huge result (or intermediate set), you will obviously blow all previous results out of the cache, but you won't have learned anything.
As you run these queries, note which SQL statements Mondrian runs. If a SQL statement is executed, it indicates that the required information was not in the cache.
Also note the size of the Mondrian process (java.exe, if you are running Mondrian from Tomcat), which increases as the amount of cached data increases, and will stop increasing once Mondrian reaches its cache limit.
Obviously, the first time you run a query Mondrian will need to execute SQL to get the data. If the cache sized large enough, then the second time you run the query, it be able to execute the query from cache, and it won't execute any SQL. (If you drill down on a member, it may execute one or two inexpensive SQL statements to retrieve the child members. In my opinion, that is acceptable.)
Now run some other queries, and then execute your original query again. Does it require SQL access? If it does, this means that the other queries have caused the original data to be flushed from the cache, and this is an indication that your cache is too small.
The cache is too large if the Mondrian process grows too large. As a general principle, the process should be 50% of the real memory on your system or less. Any larger than that and the process will start to page into virtual memory. (On a Windows system, the 'PF delta' column in task manager is a good measure of this.)
Comments, anyone? I would love to hear about people's experiences tuning Mondrian in the real world.