View Full Version : Pentaho and Hive

07-18-2013, 05:48 PM
Hi Guys, I am very new to Pentaho and I am thinking about using Pentaho in combination with Hive as an alternative to R.
I want to do the following: From Pentaho I want to submit a Hive query on a Hadoop Cluster where it is executed. After that I want to analyse the results from this query by computing the correlations between two columns which are part of the result set and I want to visualize them. I allready managed to submit a Hive query from the Pentaho Report Designer but I was not able to view the Result from Pentaho or even to visualize them. So my question is first: Is it possible to use Pentaho for this described use case? And the second question: Which of the Pentaho solutions should I download? It seems that the Report Designer is not able to analyse the result from hive. So is it the Pentaho data integration? Or Pentaho Big Data?

Help would be really apreciated.

07-19-2013, 07:53 AM
are you serios guys?nobody is able to answer this question?

07-19-2013, 12:33 PM
Hive, Hive 2, and Impala support for Pentaho client tools will be available in the imminent 4.8.2 suite release. If you are using the 4.8.x products, it is possible to upgrade the Big Data plugin in each of your Pentaho tools, but the procedure is a bit long and would have to be applied to each Pentaho client tool (Report Designer, Data Integration, etc.). The basic procedure is here:


That refers to building the plugin from source, but the artifacts are available on our Continuous Integration server (http://ci.pentaho.com/view/Big%20Data/job/pentaho-big-data-plugin-1.3/) and the latest releases are in our repository:

ZIP: http://repository.pentaho.org/artifactory/pentaho/pentaho/pentaho-big-data-plugin/
Hive JDBC shim JAR: http://repository.pentaho.org/artifactory/pentaho/pentaho/pentaho-hadoop-hive-jdbc-shim/1.3.3/pentaho-hadoop-hive-jdbc-shim-1.3.3.jar

The ZIP file contains the pentaho-big-data-plugin folder, which you would delete from your products and replace from the ZIP. The JAR file goes into each product where the JDBC drivers are (usually under libext/JDBC) and replaces the JAR of the same name (but earlier version). Depending on what Hadoop distribution you have (Apache, Cloudera, HortonWorks, MapR, etc.) you will have to configure the plugin to use that distribution (see the procedure at the link above). If you use Apache Hadoop 1.x you will need to create your own configuration, I outline how to do that in my blog here: