PDA

View Full Version : connection hdfs and hadoop input step



orin28
11-26-2012, 03:04 PM
Hello,

I try to connect to hdfs of cloudera i dont have any problem connecting with hue
or with command line , moving files from local file system to hdfs

what should be the parameters in order for that to work?
am i forgeting something .

the defult port in spoon is: 9000
but i see a lot of 8020 ..

anyway they both dont work

please help

orin

sflatley
11-27-2012, 11:23 AM
Are you getting an error? Something like "Could not resolve file "hdfs://myserver:8020/".:"

Sean

sflatley
11-27-2012, 11:57 AM
Please make sure that this file:
data-integration/plugins/pentaho-big-data-plugin/plugin.properties

has the correct active.hadoop.configuration set.

The configurations we support are:

- cdh3u4
- cdh4
- hadoop-20
- mapr

By default is "hadoop-20", I tried to connect to our cdh4 test server with that value set in the plugin.properties file. I was unable to connect.

I changed it to "cdh4" restarted spoon and was able to connect.

Hope this helps,
Sean

prav2828
12-11-2012, 01:46 AM
Yes i am getting similar error, i am using CDH4 two node cluster.
I searched for solution in google but no use....please help me
The following are the logs i got while running a copy to hdfs job:

2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [hdfs://myserver:8020/user/pdi/weblogs/parse] to [hdfs://myserver:8020/user/hive/warehouse/weblogs]. Exception : [
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://myserver/user/pdi/weblogs/parse' : Could not resolve file "hdfs://myserver:8020/user/pdi/weblogs/parse".
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]

ratnakar
01-24-2013, 02:01 AM
Yes i am getting similar error, i am using CDH4 two node cluster.
I searched for solution in google but no use....please help me
The following are the logs i got while running a copy to hdfs job:

2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Can not copy file/folder [hdfs://myserver:8020/user/pdi/weblogs/parse] to [hdfs://myserver:8020/user/hive/warehouse/weblogs]. Exception : [
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : Unable to get VFS File object for filename 'hdfs://myserver/user/pdi/weblogs/parse' : Could not resolve file "hdfs://myserver:8020/user/pdi/weblogs/parse".
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) :
2012/12/11 10:39:56 - Hadoop Copy Files - ERROR (version 4.3.0-GA, build 16753 from 2012-04-18 21.39.30 by buildguy) : ]

Hi,
Please try following solution, it worked for me:
Run command : ip addr (in VM's terminal imulator)
copy ip address mentioned in front of inet i.e. 192.168.29.129 in below example
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:b6:80:4e brd ff:ff:ff:ff:ff:ff
inet 192.168.29.129/24 brd 192.168.29.255 scope global eth0

When you try to connect to hdfs using spoon, use above ip address with port 8020

Hope this will be useful....

jpaz
01-24-2013, 11:30 AM
I noticed a few people struggling with the same issue. There's doc for that: Setting the Active Hadoop Configuration (http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_admin_guide%2Freference_active_hadoop_configuration.html). The next article to look at would be: Configuring for Cloudera (http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_admin_guide%2Ftask_configuring_cloudera.html).

If your configuration isn't currently supported send an email to someone at support, or log a JIRA ticket so we get a better idea of customer needs.

Hope that helps.

jpaz
01-25-2013, 03:06 PM
If they are using PDI 4.3 the shim probably won't work.

ejb11235
07-11-2014, 03:00 PM
This was very useful to me: Configuring Pentaho for your Hadoop Distro and Version (http://http://wiki.pentaho.com/display/BAD/Configuring+Pentaho+for+your+Hadoop+Distro+and+Version)

This link might also be helpful: Define Hadoop Connections (http://infocenter.pentaho.com/help/topic/bigdata_guide/task_configuring_cloudera.html?resultof=%22%63%6c%6f%75%64%65%72%61%22%20)

I wish I had realized I had to configure Pentaho to talk to HDFS. Once I figured it out it went smoothly.

martina10001
08-06-2014, 06:03 AM
Thanks for sharing