Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: Caching in Pentaho Reporting

  1. #1

    Exclamation Caching in Pentaho Reporting

    Hi Everybody ,
    Previuosly i worked with Microsft BI stack.Where in there is a cahing technique in SSRS which improves the performance of the report and shorten the time required to retrive the report.

    Is there any way we can create a caching in penthao manually.
    because i have created a report which reads the data from HIVE and renders the report in penthao report designer.The query is just select statement where in it fetches 500 rows but in PRD it takes approx 1 min of time to renders the report.
    If anybody can suggest me how to enhanc ethe performance of the report renedring for my scenario it will be great help full

    Thanks in advance
    Praxy

  2. #2
    Join Date
    Mar 2003
    Posts
    8,085

    Default

    Hive is not (repeat NOT) a interactive database. A map-reduce algorithm is not exactly a good match for the requirements of a database query. Caching is a part of your database. Hive explicitly states that they are NOT A DATABASE and that the DO NOT PERFORM ANY CACHING.

    You should consider a classical datawarehouse approach that holds your aggregated data. Our ETL tools and our Analysis server may help you to solve that problem.
    Get the latest news and tips and tricks for Pentaho Reporting at the Pentaho Reporting Blog.

  3. #3
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    CDA supports caching, so that would be possible to achieve!


    I'm sure CDA will be supported as a valid datasource in the near future, so that you could do what you describe
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  4. #4
    Join Date
    Mar 2003
    Posts
    8,085

    Default

    Sure, and then you are into Cache-Management hell: when to wipe the cache, how to handle the huge queries that are insane but happen, how to treat blobs and clobs ..
    Caching can be good, sure, but it is nothing that can be implemented in 5 minutes. CDA works because the queries involved are small and as no one would come up with a idea of dumping 400.000 rows of data into a dashboard. But for PRD this requirement is way to common to be ignored.

    And the question dealt with Hive. And here this totally misses the point. Let me quote the Hive website:

    What Hive is NOT

    Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries. Hive also does not provide sort of data or query cache to make repeated queries over the same data set faster.

    Hive is not designed for online transaction processing and does not offer real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs). What Hive values most are scalability (scale out with more machines added dynamically to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault-tolerance, and loose-coupling with its input formats.
    The bold parts are by me. If you use Hive because it sounds cooler to have a google-style cluster than a plain old database, then you are doing it wrong. And if you have a terrabyte of data in hive and expect reasonable performance from your reports, you are either rich (owning a super computer) or in for some great surprises. Hive is not the right tool to base a reporting solution on. Hive can have its merits if you use it as storage system from which to run datamining or from where to feed your datawarehouse (which should reside in a classical SQL database).
    Get the latest news and tips and tricks for Pentaho Reporting at the Pentaho Reporting Blog.

  5. #5
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Classical sql database? pah; move with the times, you mean a "modern column oriented database"

  6. #6
    Join Date
    Jul 2007
    Posts
    2,498

    Default

    Thomas, is that your way of telling there will be no CDA input in prpt? :'(
    Pedro Alves
    Meet us on ##pentaho, a FreeNode irc channel

  7. #7
    Join Date
    Mar 2003
    Posts
    8,085

    Default

    @Pedro: There will be a CDA input, there's no doubt about it. But the topic here was: Will we have a data caching layer in the reporting engine - and the answer for that is clearly a NO (at least not unless I get a rather huge contribution to integrate ). CDA may have caching, but so does Mondrian, and so does your ordinary SQL database. If certain low-tech databases do not have a caching system for their data-structures, then my take on it is simple: that's bad luck - get off your MS-Access installation and get a real database that is suitable for the job.

    @Codek: You are talking with a JDK 1.2 fan. For me there is no such thing as a modern database. If it aint 20 years old, how can it be proven to do the job. And no one needs more than a DBase IV system anyway
    Get the latest news and tips and tricks for Pentaho Reporting at the Pentaho Reporting Blog.

  8. #8

    Default

    Guys thanks for u r valuables inputs...
    thanks Taqua for giving quick response i really appreciate u ...

  9. #9
    Join Date
    Apr 2007
    Posts
    2,010

    Default

    Good point - In fact whats wrong with good old text files

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.