10-07-2012, 07:29 AM

I am facing the following error while running a MapReduce transformation in Pentaho

ERROR 07-10 06:23:39,909 - Pentaho MapReduce - [FAILED] -- Task: 1 Attempt: 2 Event: 12
java.lang.IllegalArgumentException: Invalid DFS directory name
at org.apache.hadoop.hdfs.DistributedFileSystem.setWorkingDirectory(DistributedFileSystem.java:150)
at org.apache.hadoop.mapred.Child$4.run(Child.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)

Can anyone suggest me what is the issue?

12-06-2012, 02:23 PM
It happened in our environment as well. A Mapreduce job that was running fine suddenly started failing without any reason. Same error message as you are seeing. In our case, below is the sequence of events that caused the problem:
1. We are currently using PDI 4.3. One of the developer downloaded PDI 4.4 for learning purpose and connected to the repository.
2. PDI 4.4 mapreduce step is not having any option to specify Hadoop distribution (Cloudera) and temporary work directory (/tmp). When that developer opened the M/R job and saved it using PDI 4.4, these 2 fields got wiped out in the Job
3. Rest of the team opened the job in PDI 4.3 and submitted the M/R jobs. The jobs failed due to the above mentioned 2 fields.

Put the value for those 2 fields back and you should be good. Not sure how this has to be handled in 4.4 though.

12-07-2012, 03:39 AM
This doesn't look like a known issue in JIRA.

Please create a bug report for this: http://jira.pentaho.com
Bug Reports and Feature Requests FAQ is over here: http://wiki.pentaho.com/display/EAI/Bug+Reports+and+Feature+Requests+FAQ