PDA

View Full Version : Hadoop Copy files Step in Job



rajeshbcrec
05-03-2012, 02:47 AM
Hi Team,

I am unable to load the .txt file in hdfs in pentaho 4.3.0GA This is the errror i am facing please let me know what i am doing wrong

2012/05/03 12:13:14 - test_load_hdfs - Starting entry [Hadoop Copy Files]
2012/05/03 12:13:14 - Hadoop Copy Files - Starting ...
2012/05/03 12:13:14 - Hadoop Copy Files - Processing row source File/folder source : [file:///E:/weblogs_rebuild.txt] ... destination file/folder : [hdfs://localhost:9000/usr/pdi/weblogs/raw]... wildcard : [^.*\.txt]
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [file:///E:/weblogs_rebuild.txt] to [hdfs://localhost:9000/usr/pdi/weblogs/raw]. Exception : [
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://localhost:9000/usr/pdi/weblogs/raw' : Could not resolve file "hdfs://localhost/usr/pdi/weblogs/raw".
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : org.pentaho.di.core.exception.KettleFileException:
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://localhost:9000/usr/pdi/weblogs/raw' : Could not resolve file "hdfs://localhost/usr/pdi/weblogs/raw".
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:161)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:104)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.entries.copyfiles.JobEntryCopyFiles.ProcessFileFolder(JobEntryCopyFiles.java:376)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.entries.copyfiles.JobEntryCopyFiles.execute(JobEntryCopyFiles.java:324)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:528)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:667)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:393)
2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.run(Job.java:313)
2012/05/03 12:13:15 - test_load_hdfs - Finished job entry [Hadoop Copy Files] (result=[false])
2012/05/03 12:13:15 - test_load_hdfs - Job execution finished
2012/05/03 12:13:15 - Spoon - Job has ended

cdeptula
05-07-2012, 10:05 AM
Make sure the hadoop core jar in the $PDI_HOME/libext/pentaho folder matches the version of Hadoop you are using. (Might be $PDI_HOME/libext/bigdata in 4.3 GA?) 4.3.0 GA ships with the Apache Hadoop 0.20 jar. You can get the jar for your version from the $HADOOP_HOME directory of your cluster.

http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions

cdeptula
05-08-2012, 09:44 AM
Try copying commons-configuration-1.7.jar (http://commons.apache.org/configuration/download_configuration.cgi) from $HADOOP_HOME/lib (I think) to $PDI_HOME/libext/bigdata also. The manual (http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions) says to do this for 0.20.205, it may also be required for 0.20.203 but I am not positive.

jganoff
05-09-2012, 09:02 AM
Make sure you remove any other hadoop core jar files except for the one that matches your cluster. The Hadoop 0.20.203.0 core jar (hadoop-core-0.20.203.0.jar) contains the class the exception claims it cannot find: org/apache/hadoop/fs/FSDataOutputStream.

jganoff
05-10-2012, 10:46 AM
That looks like a cluster configuration issue. The "could only be replicated to 0 nodes, instead of 1" could be caused by a number of reasons. Are you able to copy the file using the Hadoop command-line tool? hadoop fs -copyFromLocal <localsrc> URI

rajeshbcrec
05-16-2012, 11:23 PM
Because of datanode problem it is not working.... Now it is working


Hi,

bin/hadoop: line 258: C:\Program: command not found
12/05/12 12:58:17 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


12/05/12 12:58:17 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/05/12 12:58:17 WARN hdfs.DFSClient: Could not get block locations. Source file "/usr/pdi/weblogs/raw" - Aborting...
put: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
12/05/12 12:58:17 ERROR hdfs.DFSClient: Exception closing file /usr/pdi/weblogs/raw : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


Thanks & Regards
Vijay

smush_n
07-24-2012, 06:14 AM
I also encouter this problem,the version of hadoop is 0.20.203.0,and i have copied hadoop-core-0.20.203.0.jar to libext/bigdata,but the problem still exists,i don't know why,wait for your answer,thank you!

yvkumar
07-24-2012, 07:21 AM
can u try with hadoop 0.20.2 or higher version like hadoop 1.0 etc..

smush_n
07-24-2012, 09:03 PM
is there any other better method,thank you!

jganoff
07-24-2012, 09:15 PM
Are you able to write to the cluster from the command line?

smush_n
07-24-2012, 10:29 PM
yeah,i have tried to copy the files in local system into Hdfs with the command line,and also run the example---"wordcount"sucessfully,so i think the hadoop culster should hava no problem,haha,thank you for your help!my QQ:446503972

smush_n
07-25-2012, 01:54 AM
I have resolved the problem,commons-configuration-1.6.jar may be the case,copying it into the bigdata directory too and restart your computer,the it will work,haha,thank you ,thank you for your help,may you good luck!

hapjin
05-16-2015, 04:50 AM
what's your hadoop cluster version?? Apache hadoop0.2X ? or Apache hadoop2.x.x ? or other kind of release version such as hortworks or cloudera's version....
i also encounted this problem....i install kettle in windows ,but the cluster is in linux...Are you solved it ?

hapjin
05-16-2015, 05:35 AM
what's version of hadoop cluster do you use?? i also encounted this problem....i install spoon in windows,and hadoop2.6.0 cluster is in ubuntu.
do you solved that problem?and how solved it ?