Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Connecting PDI to Hadoop on VMWare

  1. #1
    Join Date
    Sep 2012
    Posts
    2

    Default Connecting PDI to Hadoop on VMWare

    Hi All,

    I'm looking to run some POC with PDI and Hadoop.
    I've found different VMWare appliances of Hadoop.
    The PDI version is 4.3

    When I'm trying to run PDI job to copy file to HDFS it's always pop up the following:

    Could not resolve file "hdfs://hadoop:***@192.168.247.129:50070"

    I've copied different hadoop-core files to PDI and commons-configuration anf big data plugins, unfortuntely with no luck.

    I'll be grateful for your help!

  2. #2

    Unhappy Same Problem!

    Quote Originally Posted by Stopa1 View Post
    Hi All,

    I'm looking to run some POC with PDI and Hadoop.
    I've found different VMWare appliances of Hadoop.
    The PDI version is 4.3

    When I'm trying to run PDI job to copy file to HDFS it's always pop up the following:

    Could not resolve file "hdfs://hadoop:***@192.168.247.129:50070"

    I've copied different hadoop-core files to PDI and commons-configuration anf big data plugins, unfortuntely with no luck.

    I'll be grateful for your help!
    I seem to have the same problem. I tried checking the JobTracker port number, but it didnt work! Will be glad if someone could help us out!

  3. #3
    Join Date
    Nov 2012
    Posts
    1

    Default

    Hi guys,
    Disclaimer: I'm a complete newbie, but initially also had similar problems, so just want to share what works (partially) for me.

    1st: As Stopa1 states, ensure your Hadoop core & commons-config jars are the same as the hadoop ones

    2nd: try to follow the steps as described in http://wiki.pentaho.com/display/BAD/...Data+into+HDFS
    Especially note step 5, and most importantly, step 5b. In Folder/File destination, try really typing the target folder (hdfs://iport/user/blabla/bla), and not clicking the 'browse' button to takes you to the pop up.

    I personally find the 'add' button a bit oddly located, but if you've typed all the input fields (no browse buttons), and then click the 'add', your result should show up in the bottom table, as shown in the screenshot. Then run the job, and if it gives you an authentication error (in the logging, turn on debugging mode) then you can reopen the screen and via the pop-up adapt the username/pw.

    I find running the job provides more explicit information on what is and what isn't working.

    Other than that, ensure you've got the correct port number. I only say this, because in my installation (but I've played around a bit with hadoop, so not sure if that's default or just my messing about) the port 50070 works fine for webbrowsing, but it's 54310 I have to use for connecting to the namenode. I'm sure you guys know how to find the correct ports in the config files, but another quick way to check:

    Name:  screenshot01.jpg
Views: 43
Size:  36.6 KB

    You can see that I browse to a similar address as you use in your configuration, but have to use the namenode indicated port in my Pentaho configuration.

    Hope some of this helps,

    C

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    None of these questions can be answered since none of them specify the Hadoop distribution used, nor the version.
    Unfortunately (for Hadoop), every distro out there is using difference configuration packages, different libraries and so on.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.