PDA

View Full Version : Connecting PDI to Hadoop on VMWare



Stopa1
09-30-2012, 05:03 PM
Hi All,

I'm looking to run some POC with PDI and Hadoop.
I've found different VMWare appliances of Hadoop.
The PDI version is 4.3

When I'm trying to run PDI job to copy file to HDFS it's always pop up the following:

Could not resolve file "hdfs://hadoop:***@192.168.247.129:50070"

I've copied different hadoop-core files to PDI and commons-configuration anf big data plugins, unfortuntely with no luck.

I'll be grateful for your help!

Vaishnavi Ravi
10-05-2012, 05:49 AM
Hi All,

I'm looking to run some POC with PDI and Hadoop.
I've found different VMWare appliances of Hadoop.
The PDI version is 4.3

When I'm trying to run PDI job to copy file to HDFS it's always pop up the following:

Could not resolve file "hdfs://hadoop:***@192.168.247.129:50070"

I've copied different hadoop-core files to PDI and commons-configuration anf big data plugins, unfortuntely with no luck.

I'll be grateful for your help!

I seem to have the same problem. I tried checking the JobTracker port number, but it didnt work! Will be glad if someone could help us out!

Carlo
11-12-2012, 05:02 PM
Hi guys,
Disclaimer: I'm a complete newbie, but initially also had similar problems, so just want to share what works (partially) for me.

1st: As Stopa1 states, ensure your Hadoop core & commons-config jars are the same as the hadoop ones

2nd: try to follow the steps as described in http://wiki.pentaho.com/display/BAD/Loading+Data+into+HDFS
Especially note step 5, and most importantly, step 5b. In Folder/File destination, try really typing the target folder (hdfs://ip:port/user/blabla/bla), and not clicking the 'browse' button to takes you to the pop up.

I personally find the 'add' button a bit oddly located, but if you've typed all the input fields (no browse buttons), and then click the 'add', your result should show up in the bottom table, as shown in the screenshot. Then run the job, and if it gives you an authentication error (in the logging, turn on debugging mode) then you can reopen the screen and via the pop-up adapt the username/pw.

I find running the job provides more explicit information on what is and what isn't working.

Other than that, ensure you've got the correct port number. I only say this, because in my installation (but I've played around a bit with hadoop, so not sure if that's default or just my messing about) the port 50070 works fine for webbrowsing, but it's 54310 I have to use for connecting to the namenode. I'm sure you guys know how to find the correct ports in the config files, but another quick way to check:

9776

You can see that I browse to a similar address as you use in your configuration, but have to use the namenode indicated port in my Pentaho configuration.

Hope some of this helps,

C

MattCasters
11-12-2012, 05:08 PM
None of these questions can be answered since none of them specify the Hadoop distribution used, nor the version.
Unfortunately (for Hadoop), every distro out there is using difference configuration packages, different libraries and so on.