View Full Version : Problem with "Using Pentaho MapReduce to Generate an Aggregate Dataset" tutorial

10-17-2012, 10:44 AM
Hi all!

I running Kettle 4.3 on a VMWare virtual machine downloaded from cloudera.com: CDH3 Update 3 (cdh3u3) https://ccp.cloudera.com/display/SUPPORT/CDH+Downloads

I followed the configuration steps provided at http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions (except for point C because it says "For Hadoop 0.20.205" and that, I understand, is not my case)

I downloaded the example at http://wiki.pentaho.com/display/BAD/Using+Pentaho+MapReduce+to+Generate+an+Aggregate+Dataset and configured the pentaho mapreduce step.

When I execute the job everything seems to work fine until it reaches the step "Configuring Pentaho MapReduce to use Kettle installation", etc. and it doesn't seem to go any further. The application is responsive and there's no error messages in the log:

INFO 17-10 16:19:28,777 - Spoon - Starting job...
INFO 17-10 16:19:28,793 - aggregate_mr - Start of job execution
INFO 17-10 16:19:28,873 - aggregate_mr - Starting entry [Pentaho MapReduce]
INFO 17-10 16:19:29,570 - aggregate_mapper - Dispatching started for transformation [aggregate_mapper]
INFO 17-10 16:19:29,668 - aggregate_reducer - Dispatching started for transformation [aggregate_reducer]
INFO 17-10 16:19:29,696 - Pentaho MapReduce - Configuring for Hadoop distribution: Cloudera
INFO 17-10 16:19:30,401 - Pentaho MapReduce - Cleaning output path: hdfs://localhost:8020/weblogs/aggregate_mr
INFO 17-10 16:19:30,446 - Pentaho MapReduce - Configuring Pentaho MapReduce job to use Kettle installation from /opt/pentaho/mapreduce/4.3.0

Any idea of what it could be?

Thank you very much!


10-17-2012, 12:33 PM
Resolved: the file path was incorrect

hdfs://localhost:8020/weblogs/aggregate_mr instead of hdfs://localhost:8020/user/pdi/weblogs/aggregate_mr

Suggestion: Kettle could throw an error instead of staying idle if the path doesn't exist