Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: "not a valid DFS filename" while running Pentaho Mapreduce

  1. #1
    Join Date
    Aug 2013
    Posts
    3

    Default "not a valid DFS filename" while running Pentaho Mapreduce

    using PDI 4.4.0 with CHD4.3 ,When I run the "Pentaho Mapreduce" job entry in PDI ,a error log like this :
    WARN 27-08 13:20:04,094 - mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
    INFO 27-08 13:20:04,263 - Servicerg.apache.hadoop.yarn.client.YarnClientImpl is inited.
    INFO 27-08 13:20:04,318 - Servicerg.apache.hadoop.yarn.client.YarnClientImpl is started.
    INFO 27-08 13:20:04,412 - Servicerg.apache.hadoop.yarn.client.YarnClientImpl is inited.
    INFO 27-08 13:20:04,412 - Servicerg.apache.hadoop.yarn.client.YarnClientImpl is started.
    WARN 27-08 13:20:04,452 - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    INFO 27-08 13:20:04,835 - Total input paths to process : 1
    INFO 27-08 13:20:04,897 - number of splits:1
    WARN 27-08 13:20:04,908 - mapred.jar is deprecated. Instead, use mapreduce.job.jar
    WARN 27-08 13:20:04,908 - fs.default.name is deprecated. Instead, use fs.defaultFS
    WARN 27-08 13:20:04,908 - mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
    WARN 27-08 13:20:04,909 - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
    WARN 27-08 13:20:04,909 - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
    WARN 27-08 13:20:04,909 - mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
    WARN 27-08 13:20:04,909 - mapred.job.name is deprecated. Instead, use mapreduce.job.name
    WARN 27-08 13:20:04,910 - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
    WARN 27-08 13:20:04,910 - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    WARN 27-08 13:20:04,910 - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    WARN 27-08 13:20:04,910 - mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
    WARN 27-08 13:20:04,910 - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
    WARN 27-08 13:20:04,910 - mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
    WARN 27-08 13:20:04,910 - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
    INFO 27-08 13:20:05,060 - Submitting tokens for job: job_1376311735295_0017
    INFO 27-08 13:20:05,463 - Cleaning up the staging area /yarn/stages/work/.staging/job_1376311735295_0017
    java.lang.IllegalArgumentException:
    ERROR 27-08 12:46:14,690 - Pentaho MapReduce - Pathname ......(lots of jar paths omitted):/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xom-1.1.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xpp3_min-1.1.4c.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xstream-1.4.2.jar is not a valid DFS filename.
    at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:176)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:820)
    at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:730)
    at org.apache.hadoop.mapreduce.v2.util.MRApps.addToClasspathIfNotJar(MRApps.java:230)
    at org.apache.hadoop.mapreduce.v2.util.MRApps.setClasspath(MRApps.java:188)
    at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:413)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:288)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:391)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
    at org.pentaho.hadoop.shim.common.CommonHadoopShim.submitJob(CommonHadoopShim.java:228)
    at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:821)
    at org.pentaho.di.job.Job.execute(Job.java:589)
    at org.pentaho.di.job.Job.execute(Job.java:728)
    at org.pentaho.di.job.Job.execute(Job.java:728)
    at org.pentaho.di.job.Job.execute(Job.java:443)
    at org.pentaho.di.job.Job.run(Job.java:363)

    is there any one could help me ? how can I fix this error ? thanks a lot !

  2. #2
    Join Date
    Aug 2013
    Posts
    3

    Default

    [QUOTE=
    java.lang.IllegalArgumentException:
    ERROR 27-08 12:46:14,690 - Pentaho MapReduce - Pathname ......(lots of jar paths omitted):/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xom-1.1.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xpp3_min-1.1.4c.jar:/work/pentaho/mapreduce/5.0.0-M1-TRUNK-SNAPSHOT-cdh42/lib/xstream-1.4.2.jar is not a valid DFS filename.
    [/QUOTE]

    I have already found the problem,the "mapreduce.job.classpath.files" values is a pounch of jar paths, that should be separated by ',' but not ':',so I change the source code JobEntryHadoopTransJobExecutor.java like this:

    String confPaths=conf.get("mapreduce.job.classpath.files");
    conf.set("mapreduce.job.classpath.files", confPaths.replace(":", ","));
    RunningJob runningJob = shim.submitJob(conf);

    now It works ,and the mapreduce output file is ok,but another error ocurred while completing the job entry:
    NFO 27-08 16:52:10,000 - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    INFO 27-08 16:52:10,005 - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
    ERROR 27-08 16:52:10,008 - PriviledgedActionException as:work (auth:SIMPLE) cause:java.io.IOException
    java.io.IOException
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:385)
    at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:487)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314)
    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
    at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:609)
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete(JobClient.java:275)
    at org.pentaho.hadoop.shim.common.mapred.RunningJobProxy.isComplete(RunningJobProxy.java:43)
    at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:836)
    at org.pentaho.di.job.Job.execute(Job.java:589)
    at org.pentaho.di.job.Job.execute(Job.java:728)
    at org.pentaho.di.job.Job.execute(Job.java:728)
    at org.pentaho.di.job.Job.execute(Job.java:443)
    at org.pentaho.di.job.Job.run(Job.java:363)

  3. #3
    Join Date
    Aug 2010
    Posts
    87

    Default

    If you update spoon.sh's OPT to include "-Dhadoop.cluster.path.separator=," the DistributedCache path generation works with YARN.

    With this change and a few tweaks to the job configuration in HadoopShim I've created a CDH 4.4.0 YARN Hadoop Configuration and have successfully executed the included Pentaho MapReduce examples against the latest CDH 4.4.0 QuickStart VM running YARN.

    Were you able to work around your other YARN issues? Perhaps I can lend a hand.

  4. #4
    Join Date
    Jan 2015
    Posts
    4

    Default

    hi, Thank you very much! I did it as you said. It works~Thank you!

  5. #5
    Join Date
    Jan 2015
    Posts
    4

    Default

    Hi, Thank you very much!
    I did it as you said. It works.
    Thank you!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.