US and Worldwide: +1 (866) 660-7555
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Hadoop Copy files Step in Job

  1. #1

    Default Hadoop Copy files Step in Job

    Hi Team,

    I am unable to load the .txt file in hdfs in pentaho 4.3.0GA This is the errror i am facing please let me know what i am doing wrong

    2012/05/03 12:13:14 - test_load_hdfs - Starting entry [Hadoop Copy Files]
    2012/05/03 12:13:14 - Hadoop Copy Files - Starting ...
    2012/05/03 12:13:14 - Hadoop Copy Files - Processing row source File/folder source : [file:///E:/weblogs_rebuild.txt] ... destination file/folder : [hdfs://localhost:9000/usr/pdi/weblogs/raw]... wildcard : [^.*\.txt]
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [file:///E:/weblogs_rebuild.txt] to [hdfs://localhost:9000/usr/pdi/weblogs/raw]. Exception : [
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://localhost:9000/usr/pdi/weblogs/raw' : Could not resolve file "hdfs://localhost/usr/pdi/weblogs/raw".
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : org.pentaho.di.core.exception.KettleFileException:
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://localhost:9000/usr/pdi/weblogs/raw' : Could not resolve file "hdfs://localhost/usr/pdi/weblogs/raw".
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:161)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:104)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.entries.copyfiles.JobEntryCopyFiles.ProcessFileFolder(JobEntryCopyFiles.java:376)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.entries.copyfiles.JobEntryCopyFiles.execute(JobEntryCopyFiles.java:324)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:528)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:667)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:393)
    2012/05/03 12:13:15 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : at org.pentaho.di.job.Job.run(Job.java:313)
    2012/05/03 12:13:15 - test_load_hdfs - Finished job entry [Hadoop Copy Files] (result=[false])
    2012/05/03 12:13:15 - test_load_hdfs - Job execution finished
    2012/05/03 12:13:15 - Spoon - Job has ended
    Last edited by rajeshbcrec; 06-05-2012 at 01:47 AM.

  2. #2
    Join Date
    Nov 2011
    Posts
    18

    Default

    Make sure the hadoop core jar in the $PDI_HOME/libext/pentaho folder matches the version of Hadoop you are using. (Might be $PDI_HOME/libext/bigdata in 4.3 GA?) 4.3.0 GA ships with the Apache Hadoop 0.20 jar. You can get the jar for your version from the $HADOOP_HOME directory of your cluster.

    http://wiki.pentaho.com/display/BAD/...adoop+Versions

  3. #3
    Join Date
    Nov 2011
    Posts
    18

    Default

    Try copying commons-configuration-1.7.jar from $HADOOP_HOME/lib (I think) to $PDI_HOME/libext/bigdata also. The manual says to do this for 0.20.205, it may also be required for 0.20.203 but I am not positive.

  4. #4
    Join Date
    Aug 2010
    Posts
    87

    Default

    Make sure you remove any other hadoop core jar files except for the one that matches your cluster. The Hadoop 0.20.203.0 core jar (hadoop-core-0.20.203.0.jar) contains the class the exception claims it cannot find: org/apache/hadoop/fs/FSDataOutputStream.

  5. #5
    Join Date
    Aug 2010
    Posts
    87

    Default

    That looks like a cluster configuration issue. The "could only be replicated to 0 nodes, instead of 1" could be caused by a number of reasons. Are you able to copy the file using the Hadoop command-line tool? hadoop fs -copyFromLocal <localsrc> URI

  6. #6

    Default

    Because of datanode problem it is not working.... Now it is working

    Quote Originally Posted by rajeshbcrec View Post
    Hi,

    bin/hadoop: line 258: C:\Program: command not found
    12/05/12 12:58:17 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


    12/05/12 12:58:17 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
    12/05/12 12:58:17 WARN hdfs.DFSClient: Could not get block locations. Source file "/usr/pdi/weblogs/raw" - Aborting...
    put: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
    12/05/12 12:58:17 ERROR hdfs.DFSClient: Exception closing file /usr/pdi/weblogs/raw : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


    org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /usr/pdi/weblogs/raw could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)


    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.addBlock(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy0.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)


    Thanks & Regards
    Vijay
    Last edited by rajeshbcrec; 06-08-2012 at 09:08 AM.

  7. #7
    Join Date
    Jul 2012
    Posts
    5

    Default

    I also encouter this problem,the version of hadoop is 0.20.203.0,and i have copied hadoop-core-0.20.203.0.jar to libext/bigdata,but the problem still exists,i don't know why,wait for your answer,thank you!

  8. #8
    Join Date
    Jul 2012
    Posts
    187

    Default

    can u try with hadoop 0.20.2 or higher version like hadoop 1.0 etc..

  9. #9
    Join Date
    Jul 2012
    Posts
    5

    Default

    is there any other better method,thank you!

  10. #10
    Join Date
    Aug 2010
    Posts
    87

    Default

    Are you able to write to the cluster from the command line?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •