PDA

View Full Version : Pentaho PHD 4.1.0 GA and Cloudera's CDH3b3



Jasper
01-18-2011, 05:59 PM
This is from an earlier post in the thread "Linking up PDI to Hadoop";



We've verified PDI 4.1.0-RC1, and the soon to be released PDI 4.1.0-GA, against CDH3 Beta 3.

Since I upgraded to CDH3b3 I can only establish connections between the client and Hadoop-0.20.2+737 as far as HDFS (file exchange) is concerned, but the execution of jobs isn't working anymore.

Does the PHD cope with the renamed/new hadoop system users on the Hadoop side; hdfs (renamed prior "hadoop" user) and mapred?

What does this mean for the licenses on the hadoop server? They used to be installed for the hadoop user only, but how is this now since there are 2 hadoop system users?
And how about the new security framework in CDH3b3?

I have already done the following to make it work:
-reinstalled the 2 licenses for PHD on the hadoop nodes for the new hdfs user
-renamed the /home/hadoop folder to /home/hdfs
-set the dfs.permissions property in hdfs-site to false
-replaced hadoop-core-0.20.0.jar by hadoop-core-0.20.2+737.jar (on client (!) in PENTAHO_HOME/data-integration/libext/hive & PENTAHO_HOME/data-integration/libext/pentaho)

Jasper
01-19-2011, 04:54 AM
Executing a mapred job raises the following error :


---------------------------------------------------------------------------------------------------
2011/01/19 00:26:23 - Aggregation_Impressions - Start of job execution
2011/01/19 00:26:23 - Aggregation_Impressions - exec(0, 0, START.0)
2011/01/19 00:26:23 - START - Starting job entry
2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Clean Output]
2011/01/19 00:26:23 - Aggregation_Impressions - exec(1, 0, Clean Output.0)
2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Hadoop Count Unique Impressions]
2011/01/19 00:26:23 - Aggregation_Impressions - exec(2, 0, Hadoop Count Unique Impressions.0)
2011/01/19 00:26:23 - Hadoop Count Unique Impressions - Starting job entry
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org/codehaus/jackson/map/JsonMappingException
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(SourceFile:474)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:471)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:344)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.run(Job.java:282)
-----------------------------------------------------------------------------

jganoff
01-19-2011, 05:51 PM
That looks like you're using a step provided by a plugin that is not installed on whichever node that mapper was being executed on.

jganoff
01-19-2011, 05:56 PM
Does the PHD cope with the renamed/new hadoop system users on the Hadoop side; hdfs (renamed prior "hadoop" user) and mapred?


That shouldn't matter for PHD.



What does this mean for the licenses on the hadoop server? They used to be installed for the hadoop user only, but how is this now since there are 2 hadoop system users?


The licenses are, by default, looked up in $HOME/.kettle/ folder. You can change this by setting the KETTLE_HOME environment variable for the task JVMs.

jganoff
01-20-2011, 11:14 AM
Executing a mapred job raises the following error :


---------------------------------------------------------------------------------------------------
2011/01/19 00:26:23 - Aggregation_Impressions - Start of job execution
2011/01/19 00:26:23 - Aggregation_Impressions - exec(0, 0, START.0)
2011/01/19 00:26:23 - START - Starting job entry
2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Clean Output]
2011/01/19 00:26:23 - Aggregation_Impressions - exec(1, 0, Clean Output.0)
2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Hadoop Count Unique Impressions]
2011/01/19 00:26:23 - Aggregation_Impressions - exec(2, 0, Hadoop Count Unique Impressions.0)
2011/01/19 00:26:23 - Hadoop Count Unique Impressions - Starting job entry
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org/codehaus/jackson/map/JsonMappingException
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(SourceFile:474)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:471)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:344)
2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.run(Job.java:282)
-----------------------------------------------------------------------------

I stand corrected, it looks like this is a known issue when using Cloudera's CDH3b3 hadoop-core.jar and attempting to submit jobs: https://issues.cloudera.org/browse/DISTRO-44

Jasper
01-24-2011, 06:50 AM
Hi,

It looks like when you use the CLI "hadoop jar" command directly on the hadoop node both the jackon jars are automatically on the classpath. When the job is submitted by PDI however it is not.

I found that a workaround for this is copying the jackson jars to "/home/hdfs/.kettle/plugins/pdi-hadoop-plugin" and /home/hdfs/.kettle/plugins/pdi-hadoop-plugin/lib" on hadoop and to $PENTAHO_HOME/data-integration/libext/ on the client.

-------------

That shouldn't matter for PHD. The licenses are, by default, looked up in $HOME/.kettle/ folder. You can change this by setting the KETTLE_HOME environment variable for the task JVMs.

About the new hadoop system users "mapred" and "hdfs" (formerly hadoop);

you have to install the 2 licenses for both users (or at least make the license available to both the hadoop users). If you don't, you get the 'no license found' error as soon as it touches the job executor. I haven't succeeded at pointing all the users to the same KETTLE_HOME dir and only installing the licenses (and plugins) once. Even after setting KETTLE_HOME in /etc/profile the install_license.sh script still put the resulting XML's in a separate $USER_HOME/.pentaho dir.. (not $USER_HOME/.kettle )

jganoff
01-25-2011, 04:25 PM
I haven't succeeded at pointing all the users to the same KETTLE_HOME dir and only installing the licenses (and plugins) once. Even after setting KETTLE_HOME in /etc/profile the install_license.sh script still put the resulting XML's in a separate $USER_HOME/.pentaho dir.. (not $USER_HOME/.kettle )

My mistake, the variable you want to set is called PENTAHO_INSTALLED_LICENSE_PATH. Sorry for the grief!

- Jordan