Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Problems with copying files to Hadoop

  1. #1

    Default Problems with copying files to Hadoop

    Hi,
    I've faced with problems when copy csv file from my local machine to Hadoop using Pentaho Kettle.


    I have installed follow versions of software:


    - Cloudera 5.4 - QuickStart VM with CDH 5.4.x (Virtual Machine installed over Windows 7. I use VMWare Player);
    - Hadoop 2.6.0 (/usr/share/cmf/cloudera-navigator-server/libs/cdh5/hadoop-core-2.6.0-mr1-cdh5.4.0.jar);
    - Pentaho Kettle (pdi-ce-5.3.0.0-213).


    192.168.159.128 - IP addres of Virtual machine (Hadoop installed there)




    Pentaho log below:


    --------------------------------------------------------------------


    2015/06/17 12:53:57 - DBCache - Loading database cache from file: [C:\Users\Admin\.kettle\db.cache-5.3.0.0-213]
    2015/06/17 12:53:57 - DBCache - We read 47 cached rows from the database cache!
    2015/06/17 12:53:58 - Spoon - Trying to open the last file used.
    2015/06/17 12:53:58 - Version checker - OK
    2015/06/17 12:54:04 - Spoon - Spoon
    2015/06/17 12:54:11 - Spoon - Starting job...
    2015/06/17 12:54:14 - hadoop_copy_file - Start of job execution
    2015/06/17 12:54:14 - hadoop_copy_file - exec(0, 0, START.0)
    2015/06/17 12:54:14 - START - Starting job entry
    2015/06/17 12:54:14 - hadoop_copy_file - Starting entry [Hadoop Copy Files]
    2015/06/17 12:54:14 - hadoop_copy_file - exec(1, 0, Hadoop Copy Files.0)
    2015/06/17 12:54:14 - Hadoop Copy Files - Starting job entry
    2015/06/17 12:54:14 - Hadoop Copy Files - Starting ...
    2015/06/17 12:54:14 - Hadoop Copy Files - Processing row source File/folder source : [file:///C:/0. Tkachev/0. Projects/6. МТТ/Архитектура/Hadoop/hadoop_input_file.csv] ... destination file/folder : [hdfs://192.168.159.128:8020/user/ktkachev/in]... wildcard : [null]
    2015/06/17 12:54:15 - Hadoop Copy Files - file [hdfs://192.168.159.128:8020/user/ktkachev/in\hadoop_input_file.csv] exists!
    2015/06/17 12:54:15 - Hadoop Copy Files - File [file:///C:/0. Tkachev/0. Projects/6. МТТ/Архитектура/Hadoop/hadoop_input_file.csv] was overwritten
    2015/06/17 12:54:16 - Hadoop Copy Files - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : File System Exception: Could not copy "file:///C:/0. Tkachev/0. Projects/6. МТТ/Архитектура/Hadoop/hadoop_input_file.csv" to "hdfs://192.168.159.128:8020/user/ktkachev/in/hadoop_input_file.csv".
    2015/06/17 12:54:16 - Hadoop Copy Files - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Caused by: Could not close the output stream for file "hdfs://192.168.159.128:8020/user/ktkachev/in/hadoop_input_file.csv".
    2015/06/17 12:54:16 - Hadoop Copy Files - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Caused by: File /user/ktkachev/in/hadoop_input_file.csv could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1541)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3243)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:645)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
    2015/06/17 12:54:16 - hadoop_copy_file - Finished job entry [Hadoop Copy Files] (result=[false])
    2015/06/17 12:54:16 - hadoop_copy_file - Job execution finished
    2015/06/17 12:54:16 - Spoon - Job has ended.


    --------------------------------------------------------------------


    Could you help me to resolve problem?
    Thanks in advance.


    Konstantin.

  2. #2
    Join Date
    Sep 2013
    Posts
    235

    Default

    Datanode does not want to be a friend of namenode, or replica count incorrect, or any other you virtual cluster config issue.
    I would start googling on 'There are 1 datanode(s) running and 1 node(s) are excluded in this operation.'

  3. #3
    Join Date
    Oct 2013
    Posts
    3

    Default

    Hi ,



    I am also facing this issue, when copying csv file from my local machine to apache Hadoop 2.4.1 using pdi 6.1.


    Pentaho logs is mention below:


    2016/04/21 17:38:34 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/21 17:39:28 - Spoon - Starting job...
    2016/04/21 17:39:28 - test_copy - Start of job execution
    2016/04/21 17:39:28 - test_copy - Starting entry [Hadoop Copy Files]
    2016/04/21 17:39:28 - Hadoop Copy Files - Starting ...
    2016/04/21 17:39:28 - Hadoop Copy Files - Processing row source File/folder source : [file:///D:/ETL/region/dim_circle_20160419.csv] ... destination file/folder : [hdfs://10.140.224.24:9000/data_prep/external/]... wildcard : [null]
    2016/04/21 17:39:28 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/21 17:39:28 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/21 17:39:28 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/21 17:39:32 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/21 17:39:37 - Hadoop Copy Files - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : File System Exception: Could not copy "file:///D:/ETL/region/dim_circle_20160419.csv" to "hdfs://10.140.224.24:9000/data_prep/external/dim_circle_20160419.csv".
    2016/04/21 17:39:37 - Hadoop Copy Files - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Caused by: Could not close the output stream for file "hdfs://10.140.224.24:9000/data_prep/external/dim_circle_20160419.csv".
    2016/04/21 17:39:37 - Hadoop Copy Files - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Caused by: Connection refused: no further information
    2016/04/21 17:39:37 - test_copy - Finished job entry [Hadoop Copy Files] (result=[false])
    2016/04/21 17:39:37 - test_copy - Job execution finished
    2016/04/21 17:39:37 - Spoon - Job has ended.




    Could you please help me to resolve this issue.




    Thanks,
    Jay Ojha

  4. #4

    Default

    Looks like a permission issue. Are you able to Test your cluster configuration successfully?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.