Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Unable to priview the content of a file loaded from HDFS on Spoon

  1. #1
    Join Date
    Apr 2012
    Posts
    8

    Default Unable to priview the content of a file loaded from HDFS on Spoon

    Hi... I am doing a demo project on hadoop... I am using kettle coz it takes the burden of coding away from me....

    I am running hadoop-0.20.2 on a remote server which is running on Linux-Cent OS 6.0 version with single node distribution.

    what I have done:

    I have selected a transformation and I have connected to the HDFS and selected a text file and added to the Hadoop File Input.
    Next, I have tried to preview the content of the file.

    The problem I encountered:

    org.pentaho.di.core.exception.KettleException:
    Error getting first 100 from file hdfs://****:****@50.31.134.130/user/hadoop/programex.txt

    Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt

    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2893)
    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.first(HadoopFileInputDialog.java:2765)
    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.access$200(HadoopFileInputDialog.java:115)
    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog$3.handleEvent(HadoopFileInputDialog.java:472)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.open(HadoopFileInputDialog.java:664)
    at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:136)
    at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:7742)
    at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:2755)
    at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:704)
    at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
    at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
    at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
    at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
    at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1180)
    at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:6954)
    at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:564)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
    Caused by: org.pentaho.di.core.exception.KettleFileException:
    Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:170)
    at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:94)
    at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2882)
    ... 25 more
    Caused by: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
    at java.io.DataInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.commons.vfs.util.MonitorInputStream.read(Unknown Source)
    at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
    at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
    at sun.nio.cs.StreamDecoder.read(Unknown Source)
    at sun.nio.cs.StreamDecoder.read0(Unknown Source)
    at sun.nio.cs.StreamDecoder.read(Unknown Source)
    at java.io.InputStreamReader.read(Unknown Source)
    at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:109)
    ... 27 more

    I am working on a single node distribution on Hadoop-0.20.2.... and I am using Kettle 4.3... I have found out from the below link http://wiki.pentaho.com/display/BAD/...adoop+Versions that, we need not make any configurations changes as it is launched pre-configured.

    Anyways,
    The hadoop-0.20.2-core.JAR has come by default in libext/pentaho and the same JAR is present on my Hadoop singlenode.

    I have checked in the libext/commons and found the commons-configuration-*.jar is missing and i have placed the latest version of it in the libext/commons.

    But even then, The exception is pertaining....

    Please help me in resolving this issue. It would be helpful to me if anyone can describe me the procedure in detail.

  2. #2
    Join Date
    Aug 2006
    Posts
    17

    Default

    The "Could not obtain block" message seems to indicate the HDFS cluster is not set up properly.
    I've had similar "block" messages which turned out to be related to bad network interconnection between the various HDFS nodes. In the end I had to rebuild the cluster.

  3. #3
    Join Date
    Apr 2012
    Posts
    8

    Default

    Thanks for the quick reply....

    But I have mentioned that I am running hadoop on a single node set up but not on distributed cluster. I just cant figure out what might be the problem....

  4. #4
    Join Date
    Nov 2011
    Posts
    18

    Default

    Are you able to hadoop fs -cat the file from the command line?

    hadoop fs -cat /user/hadoop/programex.txt | head -100

    If this works it isolates the problem to Pentaho. If this does not work the problem is with your Hadoop cluster and not with Pentaho. It could be an indication that the Hadoop file system is corrupt or some other issue with the cluster.

    Chris

  5. #5
    Join Date
    Apr 2012
    Posts
    8

    Default

    Hi cdeptula...

    Thank you.... I have tried to view the content of the file in the HDFS...

    Actually... I am getting the following error:

    12/04/28 09:23:58 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 0 time(s).
    12/04/28 09:23:59 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 1 time(s).
    12/04/28 09:24:00 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 2 time(s).
    12/04/28 09:24:01 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 3 time(s).
    12/04/28 09:24:02 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 4 time(s).
    12/04/28 09:24:02 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 5 time(s).
    12/04/28 09:24:03 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 6 time(s).
    12/04/28 09:24:04 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 7 time(s).
    12/04/28 09:24:05 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 8 time(s).
    12/04/28 09:24:06 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 9 time(s).
    Bad connection to FS. command aborted.

    Probably...this might the reason...

    Please be kind enough and let me know how to proceed.....

  6. #6
    Join Date
    Nov 2011
    Posts
    18

    Default

    Based on your latest error message it looks like you are having trouble connecting to the name node. A few possible causes come to mind for this:

    1. Are you sure that port 9000 is the correct port for your Hadoop NameNode? People often use port 9000 or 8020 for the name node, but this can be configured to any port number desired.
    2. Have you verified the NameNode daemon is running on the remote server? If you use the jps command does NameNode appear in the list? Are you able to go to http://yourserver:50070 and browse your HDFS file system from there?

    Chris

  7. #7
    Join Date
    Apr 2012
    Posts
    8

    Default

    Hi chris...

    Yes.. I have configured the port 9000 to the namenode.

    I have executed the jps command and it has listed all the nodes and trackers.....

    And i have tried to connect to the namenode and browse the FS from there... Actually the browser is displaying "Unable to open the page.... The Server is taking too long time to respond"...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.