Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: connection hdfs and hadoop input step

  1. #1

    Default connection hdfs and hadoop input step

    Hello,

    I try to connect to hdfs of cloudera i dont have any problem connecting with hue
    or with command line , moving files from local file system to hdfs

    what should be the parameters in order for that to work?
    am i forgeting something .

    the defult port in spoon is: 9000
    but i see a lot of 8020 ..

    anyway they both dont work

    please help

    orin

  2. #2
    Join Date
    Mar 2010
    Posts
    9

    Default

    Are you getting an error? Something like "Could not resolve file "hdfs://myserver:8020/".:"

    Sean

  3. #3
    Join Date
    Mar 2010
    Posts
    9

    Default

    Please make sure that this file:
    data-integration/plugins/pentaho-big-data-plugin/plugin.properties

    has the correct active.hadoop.configuration set.

    The configurations we support are:

    - cdh3u4
    - cdh4
    - hadoop-20
    - mapr

    By default is "hadoop-20", I tried to connect to our cdh4 test server with that value set in the plugin.properties file. I was unable to connect.

    I changed it to "cdh4" restarted spoon and was able to connect.

    Hope this helps,
    Sean

  4. #4
    Join Date
    Dec 2012
    Posts
    4

    Default

    Yes i am getting similar error, i am using CDH4 two node cluster.
    I searched for solution in google but no use....please help me
    The following are the logs i got while running a copy to hdfs job:

    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [hdfs://myserver:8020/user/pdi/weblogs/parse] to [hdfs://myserver:8020/user/hive/warehouse/weblogs]. Exception : [
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://myserver/user/pdi/weblogs/parse' : Could not resolve file "hdfs://myserver:8020/user/pdi/weblogs/parse".
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]
    Last edited by prav2828; 12-12-2012 at 01:39 AM.

  5. #5
    Join Date
    Jan 2013
    Posts
    1

    Default

    Quote Originally Posted by prav2828 View Post
    Yes i am getting similar error, i am using CDH4 two node cluster.
    I searched for solution in google but no use....please help me
    The following are the logs i got while running a copy to hdfs job:

    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [hdfs://myserver:8020/user/pdi/weblogs/parse] to [hdfs://myserver:8020/user/hive/warehouse/weblogs]. Exception : [
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://myserver/user/pdi/weblogs/parse' : Could not resolve file "hdfs://myserver:8020/user/pdi/weblogs/parse".
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
    2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]
    Hi,
    Please try following solution, it worked for me:
    Run command : ip addr (in VM's terminal imulator)
    copy ip address mentioned in front of inet i.e. 192.168.29.129 in below example
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:b6:80:4e brd ff:ff:ff:ff:ff:ff
    inet 192.168.29.129/24 brd 192.168.29.255 scope global eth0

    When you try to connect to hdfs using spoon, use above ip address with port 8020

    Hope this will be useful....

  6. #6
    Join Date
    Feb 2012
    Posts
    4

    Default Setting the Active Hadoop Configuration instructions

    I noticed a few people struggling with the same issue. There's doc for that: Setting the Active Hadoop Configuration. The next article to look at would be: Configuring for Cloudera.

    If your configuration isn't currently supported send an email to someone at support, or log a JIRA ticket so we get a better idea of customer needs.

    Hope that helps.
    Last edited by jpaz; 01-24-2013 at 11:42 AM.

  7. #7
    Join Date
    Feb 2012
    Posts
    4

    Default

    If they are using PDI 4.3 the shim probably won't work.

  8. #8
    Join Date
    Dec 2008
    Posts
    9

    Default

    This was very useful to me: Configuring Pentaho for your Hadoop Distro and Version

    This link might also be helpful: Define Hadoop Connections

    I wish I had realized I had to configure Pentaho to talk to HDFS. Once I figured it out it went smoothly.
    Last edited by ejb11235; 07-11-2014 at 03:02 PM.

  9. #9

    Default

    Thanks for sharing

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.