PDA

View Full Version : "not a valid DFS filename" while running Pentaho Mapreduce



huxinglong
08-27-2013, 01:24 AM
using PDI 4.4.0 with CHD4.3 ,When I run the "Pentaho Mapreduce" job entry in PDI ,a error log like this :
WARN 27-08 13:20:04,094 - mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
INFO 27-08 13:20:04,263 - Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
INFO 27-08 13:20:04,318 - Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
INFO 27-08 13:20:04,412 - Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
INFO 27-08 13:20:04,412 - Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
WARN 27-08 13:20:04,452 - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
INFO 27-08 13:20:04,835 - Total input paths to process : 1
INFO 27-08 13:20:04,897 - number of splits:1
WARN 27-08 13:20:04,908 - mapred.jar is deprecated. Instead, use mapreduce.job.jar
WARN 27-08 13:20:04,908 - fs.default.name is deprecated. Instead, use fs.defaultFS
WARN 27-08 13:20:04,908 - mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
WARN 27-08 13:20:04,909 - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
WARN 27-08 13:20:04,909 - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
WARN 27-08 13:20:04,909 - mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
WARN 27-08 13:20:04,909 - mapred.job.name is deprecated. Instead, use mapreduce.job.name
WARN 27-08 13:20:04,910 - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
WARN 27-08 13:20:04,910 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
WARN 27-08 13:20:04,910 - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
WARN 27-08 13:20:04,910 - mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
WARN 27-08 13:20:04,910 - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
WARN 27-08 13:20:04,910 - mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
WARN 27-08 13:20:04,910 - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
INFO 27-08 13:20:05,060 - Submitting tokens for job: job_1376311735295_0017
INFO 27-08 13:20:05,463 - Cleaning up the staging area /yarn/stages/work/.staging/job_1376311735295_0017
java.lang.IllegalArgumentException:
ERROR 27-08 12:46:14,690 - Pentaho MapReduce - Pathname ......(lots of jar paths omitted):/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xom-1.1.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xpp3_min-1.1.4c.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xstream-1.4.2.jar is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:176)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:730)
at org.apache.hadoop.mapreduce.v2.util.MRApps.addToClasspathIfNotJar(MRApps.java:230)
at org.apache.hadoop.mapreduce.v2.util.MRApps.setClasspath(MRApps.java:188)
at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:413)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:288)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:391)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
at org.pentaho.hadoop.shim.common.CommonHadoopShim.submitJob(CommonHadoopShim.java:228)
at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:821)
at org.pentaho.di.job.Job.execute(Job.java:589)
at org.pentaho.di.job.Job.execute(Job.java:728)
at org.pentaho.di.job.Job.execute(Job.java:728)
at org.pentaho.di.job.Job.execute(Job.java:443)
at org.pentaho.di.job.Job.run(Job.java:363)

is there any one could help me ? how can I fix this error ? thanks a lot !

huxinglong
08-27-2013, 05:03 AM
[QUOTE=
java.lang.IllegalArgumentException:
ERROR 27-08 12:46:14,690 - Pentaho MapReduce - Pathname ......(lots of jar paths omitted):/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xom-1.1.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xpp3_min-1.1.4c.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xstream-1.4.2.jar is not a valid DFS filename.
[/QUOTE]

I have already found the problem´╝îthe "mapreduce.job.classpath.files" values is a pounch of jar paths, that should be separated by ',' but not ':',so I change the source code JobEntryHadoopTransJobExecutor.java like this:

String confPaths=conf.get("mapreduce.job.classpath.files");
conf.set("mapreduce.job.classpath.files", confPaths.replace(":", ","));
RunningJob runningJob = shim.submitJob(conf);

now It works ,and the mapreduce output file is ok,but another error ocurred while completing the job entry:
NFO 27-08 16:52:10,000 - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
INFO 27-08 16:52:10,005 - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
ERROR 27-08 16:52:10,008 - PriviledgedActionException as:work (auth:SIMPLE) cause:java.io.IOException
java.io.IOException
at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:385)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:487)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:609)
at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:275)
at org.pentaho.hadoop.shim.common.mapred.RunningJobProxy.isComplete(RunningJobProxy.java:43)
at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:836)
at org.pentaho.di.job.Job.execute(Job.java:589)
at org.pentaho.di.job.Job.execute(Job.java:728)
at org.pentaho.di.job.Job.execute(Job.java:728)
at org.pentaho.di.job.Job.execute(Job.java:443)
at org.pentaho.di.job.Job.run(Job.java:363)

jganoff
10-22-2013, 03:24 AM
If you update spoon.sh's OPT to include "-Dhadoop.cluster.path.separator=," the DistributedCache path generation works with YARN.

With this change and a few tweaks to the job configuration in HadoopShim I've created a CDH 4.4.0 YARN Hadoop Configuration and have successfully executed the included Pentaho MapReduce examples against the latest CDH 4.4.0 QuickStart VM running YARN.

Were you able to work around your other YARN issues? Perhaps I can lend a hand.

Moore
01-28-2015, 04:13 AM
hi, Thank you very much! I did it as you said. It works~Thank you!

Moore
01-28-2015, 04:19 AM
Hi, Thank you very much!
I did it as you said. It works.
Thank you!