Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Problem with accessing HDFS through Hadoop File Input

  1. #1
    Join Date
    Jun 2011

    Default Problem with accessing HDFS through Hadoop File Input

    I am trying to connect to hadoop/hdfs (pseudo distributed) using the Hadoop File Input step. I need to Browse to a file and select it from hdfs in the Open File window that pops up. But when I click on connect, URL appear in Open From Folder : hdfs://username:password@localhost:8020 . And am unable to see the file/folder names on hdfs. I have installed hadoop and PDI on same user of system.
    Following are details I provided in Connection for Hadoop File Input:
    Server: localhost User Id : username on which hadoop and PDI are installed
    Port : 8020 Password : password for User Id
    Kindly guide me.
    Last edited by namaa; 01-10-2012 at 04:30 AM.

  2. #2
    Join Date
    Aug 2010


    Please make sure the hadoop-core library found in the $PDI_HOME/libext/pentaho directory matches the one from your cluster; and that there is only one. Kettle includes the CDH3u0 Hadoop Core jar by default.

  3. #3
    Join Date
    Feb 2012


    do you have a solution yet ?? I am facing the same issue .. tried the solution listed above, but did not help!


  4. #4


    I am also having the same issue. My hadoop-core version is 1.0.1, and I have created a hadoop-101 plugin under:

    I have added the hadoop-core-1.0.1.jar file to:

    And I have set the to:

    thank you

  5. #5
    Join Date
    Sep 2012


    I have a blog post that might help you get PDI running with Hadoop 1.x:

  6. #6


    I have tried to follow the steps in your blog post, and it seems that the configuration for hadoop 1.0.1 should not be that different from 1.0.3., and yet it does not work for me.

    The problem could be in one (or more) of the following areas:
    1) Pentaho BI hadoop plugin
    2) Pentaho BI settings, other than the plugin
    3) network (wrong port or port blocked)
    4) hadoop/hdfs or VM settings

    I can ssh and browse HDFS from the host PC to the VirtualBox VM, so I think that #3 is not the problem.
    Hadoop seems to run fine on the VM, so I think that #4 is not the problem.

    Is there an example VirtualBox VM that has a version of hadoop installed that does work with the default hadoop-20 plugin? I think that would be a good thing to add to the Pentaho walk-through tutorials.

    I assume that I should be using port 8020, since that matches the entry in my VM's core-site.xml, yes?

    I was going to try and eliminate the network as the problem by running hdfs and and Pentaho on the same machine, a Macbook pro, but unfortunately the hadoop version I have on there is 1.2.1, which is again incompatible.
    thank you
    Last edited by Luke2; 09-30-2013 at 02:16 PM.

  7. #7
    Join Date
    Sep 2015


    Please let me know if anyone have resolved this issue? I have posted a similar issue in above mentioned thread


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.