PDA

View Full Version : Unable to priview the content of a file loaded from HDFS on Spoon



Viswajit
04-27-2012, 04:57 AM
Hi... I am doing a demo project on hadoop... I am using kettle coz it takes the burden of coding away from me....

I am running hadoop-0.20.2 on a remote server which is running on Linux-Cent OS 6.0 version with single node distribution.

what I have done:

I have selected a transformation and I have connected to the HDFS and selected a text file and added to the Hadoop File Input.
Next, I have tried to preview the content of the file.

The problem I encountered:

org.pentaho.di.core.exception.KettleException:
Error getting first 100 from file hdfs://****:****@50.31.134.130/user/hadoop/programex.txt

Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt

at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2893)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.first(HadoopFileInputDialog.java:2765)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.access$200(HadoopFileInputDialog.java:115)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog$3.handleEvent(HadoopFileInputDialog.java:472)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.open(HadoopFileInputDialog.java:664)
at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:136)
at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:7742)
at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:2755)
at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:704)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1180)
at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:6954)
at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:564)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
Caused by: org.pentaho.di.core.exception.KettleFileException:
Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:170)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:94)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2882)
... 25 more
Caused by: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
at java.io.DataInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at org.apache.commons.vfs.util.MonitorInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at sun.nio.cs.StreamDecoder.read0(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:109)
... 27 more

I am working on a single node distribution on Hadoop-0.20.2.... and I am using Kettle 4.3... I have found out from the below link http://wiki.pentaho.com/display/BAD/...adoop+Versions (http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions) that, we need not make any configurations changes as it is launched pre-configured.

Anyways,
The hadoop-0.20.2-core.JAR has come by default in libext/pentaho and the same JAR is present on my Hadoop singlenode.

I have checked in the libext/commons and found the commons-configuration-*.jar is missing and i have placed the latest version of it in the libext/commons.

But even then, The exception is pertaining....:mad:

Please help me in resolving this issue. It would be helpful to me if anyone can describe me the procedure in detail.

pstnotpd
04-27-2012, 05:14 AM
The "Could not obtain block" message seems to indicate the HDFS cluster is not set up properly.
I've had similar "block" messages which turned out to be related to bad network interconnection between the various HDFS nodes. In the end I had to rebuild the cluster.

Viswajit
04-27-2012, 06:13 AM
Thanks for the quick reply....

But I have mentioned that I am running hadoop on a single node set up but not on distributed cluster. I just cant figure out what might be the problem....

cdeptula
04-27-2012, 09:46 AM
Are you able to hadoop fs -cat the file from the command line?

hadoop fs -cat /user/hadoop/programex.txt | head -100

If this works it isolates the problem to Pentaho. If this does not work the problem is with your Hadoop cluster and not with Pentaho. It could be an indication that the Hadoop file system is corrupt or some other issue with the cluster.

Chris

Viswajit
04-28-2012, 10:28 AM
Hi cdeptula...

Thank you.... I have tried to view the content of the file in the HDFS...

Actually... I am getting the following error:

12/04/28 09:23:58 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 0 time(s).
12/04/28 09:23:59 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 1 time(s).
12/04/28 09:24:00 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 2 time(s).
12/04/28 09:24:01 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 3 time(s).
12/04/28 09:24:02 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 4 time(s).
12/04/28 09:24:02 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 5 time(s).
12/04/28 09:24:03 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 6 time(s).
12/04/28 09:24:04 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 7 time(s).
12/04/28 09:24:05 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 8 time(s).
12/04/28 09:24:06 INFO ipc.Client: Retrying connect to server: /50.31.134.130:9000. Already tried 9 time(s).
Bad connection to FS. command aborted.

Probably...this might the reason...

Please be kind enough and let me know how to proceed.....

cdeptula
04-28-2012, 03:57 PM
Based on your latest error message it looks like you are having trouble connecting to the name node. A few possible causes come to mind for this:

1. Are you sure that port 9000 is the correct port for your Hadoop NameNode? People often use port 9000 or 8020 for the name node, but this can be configured to any port number desired.
2. Have you verified the NameNode daemon is running on the remote server? If you use the jps command does NameNode appear in the list? Are you able to go to http://yourserver:50070 and browse your HDFS file system from there?

Chris

Viswajit
04-29-2012, 03:42 AM
Hi chris...

Yes.. I have configured the port 9000 to the namenode.

I have executed the jps command and it has listed all the nodes and trackers.....

And i have tried to connect to the namenode and browse the FS from there... Actually the browser is displaying "Unable to open the page.... The Server is taking too long time to respond"...