Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Pentaho PHD 4.1.0 GA and Cloudera's CDH3b3

  1. #1

    Default Pentaho PHD 4.1.0 GA and Cloudera's CDH3b3

    This is from an earlier post in the thread "Linking up PDI to Hadoop";

    Quote Originally Posted by jganoff View Post
    We've verified PDI 4.1.0-RC1, and the soon to be released PDI 4.1.0-GA, against CDH3 Beta 3.
    Since I upgraded to CDH3b3 I can only establish connections between the client and Hadoop-0.20.2+737 as far as HDFS (file exchange) is concerned, but the execution of jobs isn't working anymore.

    Does the PHD cope with the renamed/new hadoop system users on the Hadoop side; hdfs (renamed prior "hadoop" user) and mapred?

    What does this mean for the licenses on the hadoop server? They used to be installed for the hadoop user only, but how is this now since there are 2 hadoop system users?
    And how about the new security framework in CDH3b3?

    I have already done the following to make it work:
    -reinstalled the 2 licenses for PHD on the hadoop nodes for the new hdfs user
    -renamed the /home/hadoop folder to /home/hdfs
    -set the dfs.permissions property in hdfs-site to false
    -replaced hadoop-core-0.20.0.jar by hadoop-core-0.20.2+737.jar (on client (!) in PENTAHO_HOME/data-integration/libext/hive & PENTAHO_HOME/data-integration/libext/pentaho)
    Last edited by Jasper; 01-19-2011 at 04:55 AM.

  2. #2

    Default mapred job: org/codehaus/jackson/map/JsonMappingException

    Executing a mapred job raises the following error :


    ---------------------------------------------------------------------------------------------------
    2011/01/19 00:26:23 - Aggregation_Impressions - Start of job execution
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(0, 0, START.0)
    2011/01/19 00:26:23 - START - Starting job entry
    2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Clean Output]
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(1, 0, Clean Output.0)
    2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Hadoop Count Unique Impressions]
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(2, 0, Hadoop Count Unique Impressions.0)
    2011/01/19 00:26:23 - Hadoop Count Unique Impressions - Starting job entry
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org/codehaus/jackson/map/JsonMappingException
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(SourceFile:474)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:471)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:344)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.run(Job.java:282)
    -----------------------------------------------------------------------------

  3. #3
    Join Date
    Aug 2010
    Posts
    87

    Default

    That looks like you're using a step provided by a plugin that is not installed on whichever node that mapper was being executed on.

  4. #4
    Join Date
    Aug 2010
    Posts
    87

    Default

    Quote Originally Posted by Jasper View Post
    Does the PHD cope with the renamed/new hadoop system users on the Hadoop side; hdfs (renamed prior "hadoop" user) and mapred?
    That shouldn't matter for PHD.

    Quote Originally Posted by Jasper View Post
    What does this mean for the licenses on the hadoop server? They used to be installed for the hadoop user only, but how is this now since there are 2 hadoop system users?
    The licenses are, by default, looked up in $HOME/.kettle/ folder. You can change this by setting the KETTLE_HOME environment variable for the task JVMs.

  5. #5
    Join Date
    Aug 2010
    Posts
    87

    Default

    Quote Originally Posted by Jasper View Post
    Executing a mapred job raises the following error :


    ---------------------------------------------------------------------------------------------------
    2011/01/19 00:26:23 - Aggregation_Impressions - Start of job execution
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(0, 0, START.0)
    2011/01/19 00:26:23 - START - Starting job entry
    2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Clean Output]
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(1, 0, Clean Output.0)
    2011/01/19 00:26:23 - Aggregation_Impressions - Starting entry [Hadoop Count Unique Impressions]
    2011/01/19 00:26:23 - Aggregation_Impressions - exec(2, 0, Hadoop Count Unique Impressions.0)
    2011/01/19 00:26:23 - Hadoop Count Unique Impressions - Starting job entry
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org/codehaus/jackson/map/JsonMappingException
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(SourceFile:474)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:471)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:600)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.execute(Job.java:344)
    2011/01/19 00:26:24 - Hadoop Count Unique Impressions - ERROR (version 4.1.0-GA, build 14380 from 2010-11-09 17.25.17 by buildguy) : org.pentaho.di.job.Job.run(Job.java:282)
    -----------------------------------------------------------------------------
    I stand corrected, it looks like this is a known issue when using Cloudera's CDH3b3 hadoop-core.jar and attempting to submit jobs: https://issues.cloudera.org/browse/DISTRO-44

  6. #6

    Default

    Hi,

    It looks like when you use the CLI "hadoop jar" command directly on the hadoop node both the jackon jars are automatically on the classpath. When the job is submitted by PDI however it is not.

    I found that a workaround for this is copying the jackson jars to "/home/hdfs/.kettle/plugins/pdi-hadoop-plugin" and /home/hdfs/.kettle/plugins/pdi-hadoop-plugin/lib" on hadoop and to $PENTAHO_HOME/data-integration/libext/ on the client.

    -------------
    Quote Originally Posted by jganoff View Post
    That shouldn't matter for PHD. The licenses are, by default, looked up in $HOME/.kettle/ folder. You can change this by setting the KETTLE_HOME environment variable for the task JVMs.
    About the new hadoop system users "mapred" and "hdfs" (formerly hadoop);

    you have to install the 2 licenses for both users (or at least make the license available to both the hadoop users). If you don't, you get the 'no license found' error as soon as it touches the job executor. I haven't succeeded at pointing all the users to the same KETTLE_HOME dir and only installing the licenses (and plugins) once. Even after setting KETTLE_HOME in /etc/profile the install_license.sh script still put the resulting XML's in a separate $USER_HOME/.pentaho dir.. (not $USER_HOME/.kettle )
    Last edited by Jasper; 01-24-2011 at 09:24 AM.

  7. #7
    Join Date
    Aug 2010
    Posts
    87

    Default

    Quote Originally Posted by Jasper View Post
    I haven't succeeded at pointing all the users to the same KETTLE_HOME dir and only installing the licenses (and plugins) once. Even after setting KETTLE_HOME in /etc/profile the install_license.sh script still put the resulting XML's in a separate $USER_HOME/.pentaho dir.. (not $USER_HOME/.kettle )
    My mistake, the variable you want to set is called PENTAHO_INSTALLED_LICENSE_PATH. Sorry for the grief!

    - Jordan

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.