Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: HDFS connection issue

  1. #1

    Default HDFS connection issue

    I am trying to follow the example from the Big Data Wiki about loading files to HDFS.
    Other than in the video, my Hadoop resides in a VM and is not accessible locally.

    Although a tool from a competitor works like a charm and allows me to get data from the HDFS-files, I cannot establish a connection from within Kettle.
    I habe included a "Hadoop Copy Files" component and want to configure it to gain access to the HDFS of my pseudo-cluster.

    I have input the correct hostname and am using the same port as in the video "8020".
    But I am getting the error "Unable to connect to HDFS server". I have checked with other servers in my network and get the same error.

    The competitor tool only requires to enter the hostname and accesses the files via port 50070.
    If i change my configuration accordingly, Kettle gives an error as "Error editing job entry".
    It seems it can find the cluster, but cannot proceed.
    The "Details" button reveals the stack trace included at the end.

    As I am able to connect with the competitor tool, I exclude any connection problems and assume the problem lies with Kettle so far.

    I have searched for similar posts with an empty result set. Other posts refer to HBase or other issues.
    Any ideas?

    Thanks in advance, Michael
    >>>>>>>>>>>>>>>>>
    java.lang.NullPointerException
    at org.pentaho.vfs.ui.VfsFileChooserDialog.setSelectedFile(VfsFileChooserDialog.java:978)
    at org.pentaho.di.ui.vfs.hadoopvfsfilechooserdialog.HadoopVfsFileChooserDialog$3.widgetSelected(HadoopVfsFileChooserDialog.java:235)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.vfs.ui.VfsFileChooserDialog.open(VfsFileChooserDialog.java:414)
    at org.pentaho.vfs.ui.VfsFileChooserDialog.open(VfsFileChooserDialog.java:342)
    at org.pentaho.di.ui.job.entries.hadoopcopyfiles.JobEntryHadoopCopyFilesDialog.setSelectedFile(JobEntryHadoopCopyFilesDialog.java:998)
    at org.pentaho.di.ui.job.entries.hadoopcopyfiles.JobEntryHadoopCopyFilesDialog.access$400(JobEntryHadoopCopyFilesDialog.java:73)
    at org.pentaho.di.ui.job.entries.hadoopcopyfiles.JobEntryHadoopCopyFilesDialog$11.widgetSelected(JobEntryHadoopCopyFilesDialog.java:548)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.job.entries.hadoopcopyfiles.JobEntryHadoopCopyFilesDialog.open(JobEntryHadoopCopyFilesDialog.java:835)
    at org.pentaho.di.ui.spoon.delegates.SpoonJobDelegate.editJobEntry(SpoonJobDelegate.java:283)
    at org.pentaho.di.ui.spoon.Spoon.editJobEntry(Spoon.java:7711)
    at org.pentaho.di.ui.spoon.job.JobGraph.editEntry(JobGraph.java:2551)
    at org.pentaho.di.ui.spoon.job.JobGraph.mouseDoubleClick(JobGraph.java:601)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1221)
    at org.pentaho.di.ui.spoon.Spoon.waitForDispose(Spoon.java:7044)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:8304)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:580)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
    >>>>>>>>>>>>>>>>>

  2. #2
    Join Date
    Sep 2012
    Posts
    71

    Default

    The port to use is the one specified in your Hadoop cluster's core-site.xml under the "fs.default.name" property. You can also find out by connecting via HTTP to the namenode in a browser, so for a local Hadoop cluster, you'd go to:

    http://localhost:50070

    The page should show something like the following text at the top:

    NameNode 'localhost:<port>'

    where <port> is the value you want to use in the VFS browser.

    What distribution and version (Apache 0.20.2, Cloudera CDH4.1.1, etc.) is your Hadoop cluster?

  3. #3
    Join Date
    Apr 2013
    Posts
    1

    Default

    I had this error and solved it. My root cause was that I was using a different disto of hadoop and had to set up it's folder under data-integration/plugins/pentaho-big-data-plugin/hadoop-configurations and select it via data-integration/plugins/pentaho-big-data-plugin/plugin.properties . This blog was very helpful.

  4. #4
    Join Date
    Sep 2012
    Posts
    71

    Default

    Heh, that's my blog Glad you found it useful and your problem is solved! I will likely update the blog post (or add a new one) for adding Hive 0.10 to the Apache 1.0.x shim. For now it should be compatible from Hive 0.7 to Hive 0.9.

  5. #5

    Default

    Did you have to replace the default pentaho-hadoop-shims-hadoop-20-5.0.0.1.jar file to get Pentaho - HDFS to work?

  6. #6

    Default

    I was finally able to connect to hadoop 1.2.1 HDFS running locally by following the instructions in Matt Burgess's blog. But that was only a local connection, where both HDFS and Pentaho DI were running on the same mac. I have not been successful in getting a connection to HDFS running inside a VirtualBox VM. The firewall is off, and the VM network is running in bridged mode, but there must be something wrong with my setup, this should work. Any suggestions would be greatly appreciated! thank you

  7. #7

    Default

    I finally solved the problem. The core-site.xml inside the VM had the fs.default.name as hdfs://localhost:8020, and that was fine for access to HDFS via browser to http://{vm-ip}:50070/dfshealth.jsp, but it had to be changed to hdfs://{vm-ip}:8020 in order for Pentaho Hadoop File Input to connect to it.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.