PDA

View Full Version : Is Pentaho for Hadoop fast?



genesis
11-04-2010, 10:35 AM
I was interested in Pentaho for Hadoop because I thought it might provide quick front-end access to Hadoop data. By quick, I was hoping it would be faster than MapReduce. If a user were to select an element in a dropdown list on a GUI, I was hoping results might return in seconds instead of minutes (or longer).

But I was dreaming, right? From what I saw in a Pentaho video presentation, they use Hive, which itself kicks off MapReduce jobs. So Pentaho for Hadoop still wouldn't be feasible for ad-hoc or "real-time" reporting, right? It's just a prettier way to run daily/nightly Hadoop batch jobs.

Please confirm or correct me if I'm wrong. Thank you very much.

jtcornelius
11-04-2010, 11:02 AM
You are correct. Regardless of the design GUIs and visualization tools we place on top of it, the performance characteristics of Hadoop (and hence Hive) are not ideal for all business intelligence use cases. Real time reporting or Interactive Analysis (OLAP) are examples of use cases where the batch processing design of Hadoop is not a good technology fit. What Pentaho can provide you for fulfilling these use cases is a very, very simple way of spinning off data marts by taking a slice of the data in hadoop and putting it in the hands of business users for doing ad hoc Q&R or OLAP analysis. I would recommend checking out the video series by James Dixon, CTO of Pentaho, on rational ways of leveraging Hadoop for Data Integration and Business Intelligence. The video series can be seen here: http://www.pentaho.com/hadoop/resources.php

Also, you may want to review some of our other videos on Agile BI which focus more on how you can rapidly design/deploy BI solutions with Pentaho.

Hope this helps,
Jake