PDA

View Full Version : Tasktracker does not pick up pentaho jars from HADOOP_CLASSPATH in cloudera CDH3



pstnotpd
04-20-2012, 01:53 AM
Hi all,

I've successfully deployed the weblog parse & aggregate jobs to a cloudera cluster built with the cloudera manager free edition.
However, this only worked when I copied all the jar files of the PHD to files to /usr/lib/hadoop/lib on every tasktracker node in the cluster.

I've followed the steps for cloudera configuration

http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions

That is, the PHD distribution is copied to /opt/pentaho/pentaho-mapreduce, the "hadoop-env.sh" file in /usr/lib/hadoop/conf is edited to include the HADOOP_CLASSPATH entry and "mapred-site.xml" was edited.

I noticed that the entries in "mapred-site.xml" were not picked up when restarting through the manager.
When I put the <property> entries on the tasktracker configuration page of cloudera manager they were correctly picked up.

From this I assume the hadoop-env.sh is treated in a similar way, i.e. is "managed" by the cloudera manager, but I cannot find the location where entries like HADOOP_CLASSPATH can be added.

Does anybody know where to edit the HADOOP_CLASSPATH from cloudera manager to add the PHD path?