PDA

View Full Version : Request to download "Pentaho Data Integration for Hadoop 4.1 GA" for Linux



afancy
12-23-2010, 06:12 AM
Hi,

I have requested to download "Pentaho Data Integration for Hadoop 4.1 GA" for Linux for many times. Why have I still not received the download link? Could you check it for me? Thanks

Now I have downloaded a PDI for Window, however, my Hadoop is installed on Linux. It throws the following exception for it fails to find the class file. It seems that some jar files should be copied to the Hadoop server. Could you tell me which jars should be copied? thanks!

###########
Caused by: java.lang.NoClassDefFoundError: org/pentaho/di/core/exception/KettleException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:772)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 10 more

Jasper
12-23-2010, 09:00 AM
Hi,
It doesn't matter if your client runs on Windows. It should normally communicate to the Pentaho Distribution for Hadoop (PHD) on the Linux side. I have the same set up and it works.

Did you install the PHD on your Hadoop nodes? There is some 300+ MB's of Pentaho SW to be installed on top of each node in your cluster. This PHD can be found in the PDI 4.1 server download. In there is a zip file under pentaho/server/ named PHD-ee-4.1.0-GA.zip.

afancy
12-28-2010, 04:58 PM
Hi, Jasper,

Thanks! I have found the zip file pentaho\server\phd-ee-4.1.0-GA.zip.
How to install it in Hadoop nodes? I cannot found the relevant installation document. The only one I found is this one (http://wiki.pentaho.com/download/attachments/19235112/hadoop_pentaho.pdf), but it seems not for phd-ee-4.1.

Jasper
12-29-2010, 07:09 AM
Well that installation is outdated now. Now you can just unzip the PHD to the Hadoop home install directory. The PHD adds some new files to the $HADOOP_HOME/lib directory, unzipping that care of that.

After unzipping you have to install 2 licenses (PDI Ent. + Hadoop Ent.) and off you go.

You have to repeat this for every node in your cluster.