View Full Version : Problem with accessing HDFS through Hadoop File Input

01-10-2012, 04:27 AM
I am trying to connect to hadoop/hdfs (pseudo distributed) using the Hadoop File Input step. I need to Browse to a file and select it from hdfs in the Open File window that pops up. But when I click on connect, URL appear in Open From Folder : hdfs://username:password@localhost:8020 . And am unable to see the file/folder names on hdfs. I have installed hadoop and PDI on same user of system.
Following are details I provided in Connection for Hadoop File Input:
Server: localhost User Id : username on which hadoop and PDI are installed
Port : 8020 Password : password for User Id
Kindly guide me.

01-23-2012, 09:20 AM
Please make sure the hadoop-core library found in the $PDI_HOME/libext/pentaho directory matches the one from your cluster; and that there is only one. Kettle includes the CDH3u0 Hadoop Core jar by default.

02-25-2012, 12:33 PM
do you have a solution yet ?? I am facing the same issue .. tried the solution listed above, but did not help!


09-27-2013, 11:30 AM
I am also having the same issue. My hadoop-core version is 1.0.1, and I have created a hadoop-101 plugin under:

I have added the hadoop-core-1.0.1.jar file to:

And I have set the plugin.properties to:

thank you

09-29-2013, 12:46 PM
I have a blog post that might help you get PDI running with Hadoop 1.x:


09-30-2013, 02:10 PM
I have tried to follow the steps in your blog post, and it seems that the configuration for hadoop 1.0.1 should not be that different from 1.0.3., and yet it does not work for me.

The problem could be in one (or more) of the following areas:
1) Pentaho BI hadoop plugin
2) Pentaho BI settings, other than the plugin
3) network (wrong port or port blocked)
4) hadoop/hdfs or VM settings

I can ssh and browse HDFS from the host PC to the VirtualBox VM, so I think that #3 is not the problem.
Hadoop seems to run fine on the VM, so I think that #4 is not the problem.

Is there an example VirtualBox VM that has a version of hadoop installed that does work with the default hadoop-20 plugin? I think that would be a good thing to add to the Pentaho walk-through tutorials.

I assume that I should be using port 8020, since that matches the entry in my VM's core-site.xml, yes?

I was going to try and eliminate the network as the problem by running hdfs and and Pentaho on the same machine, a Macbook pro, but unfortunately the hadoop version I have on there is 1.2.1, which is again incompatible.
thank you

09-07-2015, 02:47 PM

Please let me know if anyone have resolved this issue? I have posted a similar issue in above mentioned thread