PDA

View Full Version : Pentaho MapReduce



Manasa_InfoAxon
04-18-2012, 03:32 AM
Hi, I am using Pentaho Mapreduce to parse Weblog Data in MapR, I am going throught the sample document provided in wiki.pentaho.com. However when I try to run the job, without showing error it keeps on running and in logging tab it show "Pentaho-Mapreduce- Setup complete:0.0 Mapper Completion:0.0 Reducer completion:0.0"..................................It shows same message several time and keeps running without any result....I stop it manually then it stops, finally with out any error it says "Pentaho Mapreduce failed".....please help me...I am using hadoop 0.20.0 version and Pentaho 4.3 trial version...i have attached .ktr and .kjb files..pfa...thank you

cdeptula
04-18-2012, 03:29 PM
Manasa,

A few things to check:

1. Are you able to run the sample MapReduce job that ships with MapR from the command line?
hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar wordcount /myvolume/in /myvolume/out If this also hangs then the problem is with your MapR cluster not Pentaho and you need to verify your MapR cluster is running properly.
2. Have you followed the Hadoop Node Configuration steps detailed here: http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+MapR
3. Are there any errors in your spoon log that are not appearing in the logging tab? On Windows your spoon log is in C:\Users\<username>\AppData\Local\Temp and is named spoon*.log. On linux the spoon log is usually in /tmp. You may have to close Spoon before these logs are written.

Hope this helps,
Chris

Manasa_InfoAxon
04-19-2012, 04:00 AM
Thank you for suggestions,


1. I executed hadoop-examples. jar files on command prompt, there is no error, it is running fine, means MAPR cluster is running fine.


2. I have installed hadoop-0.20.2 on window 7, using cygwin s/w, I tried "configure Pentaho for MapR" from wiki.pentaho.
during configuration : I updated the launcher.properties as defined there, deleted hadoop-0.20.2.jar, and copied hadoop-0.20.2-dev-core.jar into $PDI_HOME/libext, after this step I didn't get maprfs-0.1.jar and also in wiki.pentaho hadoop configuration, it is said to add " /opt/pentaho/pentaho-mapreduce/lib/*" to hadoop_classpath in "hadoop-env.sh", however as I am using hadoop on windows through Cygwin and I have extracted Pentaho 4.3 trial version, I am not finding any folder called pentaho-mapreduce, so which path is to be added in "Hadoop_Classpath" of "Hadoop-env.sh". and also next step which is updating " /conf/mapred-site.xml" I haven't done because, as early said I am not getting "pentaho/pentaho-mapreduce" path.


3. I tried again to run Pentaho MapReduce job, as attached , now it is giving me error
error: java.lang.ClassNotFoundException: org.pentaho.di.core.exception.KettleException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
2012/04/19 12:19:42 - at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
2012/04/19 12:19:42 - at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
2012/04/19 12:19:42 - at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
2012/04/19 12:19:42 - at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
2012/04/19 12:19:42 - at java.lang.Class.forName0(Native Method)
2012/04/19 12:19:42 - at java.lang.Class.forName(Class.java:247)
2012/04/19 12:19:42 - at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
2012/04/19 12:19:42 - at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
2012/04/19 12:19:42 - at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
2012/04/19 12:19:42 - at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:790)
2012/04/19 12:19:42 - at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
2012/04/19 12:19:42 - at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
2012/04/19 12:19:42 - at org.apache.hadoop.mapred.Child.main(Child.java:170)


Please help...
Thank you for suggestions,

cdeptula
04-19-2012, 10:08 AM
Manasa,

I incorrectly thought you said you were using MapR Hadoop. If you are using Apache Hadoop 0.20.2 you are not using MapR and should instead follow the instructions here: http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions. You do need to follow the Hadoop Node Configuration steps. Note the PHD component these steps tell you to install is a separate download and install from the PDI 4.3 Preview release. This install will create the Pentaho MapReduce folder. The link to the download is in the instructions. I have never used Hadoop with Cygwin so I am not sure if there are any quirks to the install.

When PDI 4.3 GA's the PHD install will not be necessary, but it is required for the 4.3 Preview release.

You also should follow the Hadoop How-Tos instead of the MapR ones http://wiki.pentaho.com/display/BAD/Hadoop.

Chris

kepha
07-26-2012, 09:18 PM
Hi, I'm trying the same thing as Manasa and I encounter exactly the same problem. I followed the instructions you suggested but still the same exception.
Well the other thing is that PHD is not necessary anymore as I understood from these guidelines. I just installed Apache Hadoop 0.20.2 to run locally as one node.
Does anyone have some insight on this? Manasa, did you manage to solve it?
Thanks

kepha
07-27-2012, 02:29 PM
Does anyone has the insights about this problem? I still have the same issue.
I run hadoop-0.20.2 locally, I followed the instructions for this example:
http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Parse+Weblog+Data.
*The guidelines in some parts are different form the images, I suppose this happened when you changed the text. Anyway, just to let you know.

Since I run hadoop-0.20.2 I suppose no other configuration of the kettle is needed as said here:
http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions

I tested hadoop independently and it works ok.

I do not see what can be the problem I tried also very simple mapper transformations and the problem is still the same.
Here is Spoon's log:

2012/07/27 11:16:31 - Pentaho MapReduce 2 - Setup Complete: 0.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2012/07/27 11:16:36 - Pentaho MapReduce 2 - Setup Complete: 0.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2012/07/27 11:16:41 - Pentaho MapReduce 2 - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2012/07/27 11:16:46 - Pentaho MapReduce 2 - Setup Complete: 100.0 Mapper Completion: 0.0 Reducer Completion: 0.0
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : [FAILED] -- Task: 0 Attempt: 0 Event: 1
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : Error: java.lang.ClassNotFoundException: org.pentaho.di.core.exception.KettleException
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.security.AccessController.doPrivileged(Native Method)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.lang.Class.forName0(Native Method)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at java.lang.Class.forName(Class.java:264)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:833)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:790)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
2012/07/27 11:16:46 - Pentaho MapReduce 2 - ERROR (version 4.3.0-stable, build 16786 from 2012-04-24 14.11.32 by buildguy) : at org.apache.hadoop.mapred.Child.main(Child.java:170)

jganoff
07-30-2012, 08:10 PM
Hi kepha,

You're correct that the PHD is no longer required. You appear to be running an older version of the Big Data Plugin that relies upon it though. Please try the latest PDI 4.3.0 stable release. You should see log messages indicating the Kettle environment is being staged into HDFS before Pentaho MapReduce begins execution.

Best,
Jordan