Viswajit
04-27-2012, 04:57 AM
Hi... I am doing a demo project on hadoop... I am using kettle coz it takes the burden of coding away from me....
I am running hadoop-0.20.2 on a remote server which is running on Linux-Cent OS 6.0 version with single node distribution.
what I have done:
I have selected a transformation and I have connected to the HDFS and selected a text file and added to the Hadoop File Input.
Next, I have tried to preview the content of the file.
The problem I encountered:
org.pentaho.di.core.exception.KettleException:
Error getting first 100 from file hdfs://****:****@50.31.134.130/user/hadoop/programex.txt
Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2893)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.first(HadoopFileInputDialog.java:2765)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.access$200(HadoopFileInputDialog.java:115)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog$3.handleEvent(HadoopFileInputDialog.java:472)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.open(HadoopFileInputDialog.java:664)
at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:136)
at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:7742)
at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:2755)
at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:704)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1180)
at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:6954)
at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:564)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
Caused by: org.pentaho.di.core.exception.KettleFileException:
Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:170)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:94)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2882)
... 25 more
Caused by: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
at java.io.DataInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at org.apache.commons.vfs.util.MonitorInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at sun.nio.cs.StreamDecoder.read0(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:109)
... 27 more
I am working on a single node distribution on Hadoop-0.20.2.... and I am using Kettle 4.3... I have found out from the below link http://wiki.pentaho.com/display/BAD/...adoop+Versions (http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions) that, we need not make any configurations changes as it is launched pre-configured.
Anyways,
The hadoop-0.20.2-core.JAR has come by default in libext/pentaho and the same JAR is present on my Hadoop singlenode.
I have checked in the libext/commons and found the commons-configuration-*.jar is missing and i have placed the latest version of it in the libext/commons.
But even then, The exception is pertaining....:mad:
Please help me in resolving this issue. It would be helpful to me if anyone can describe me the procedure in detail.
I am running hadoop-0.20.2 on a remote server which is running on Linux-Cent OS 6.0 version with single node distribution.
what I have done:
I have selected a transformation and I have connected to the HDFS and selected a text file and added to the Hadoop File Input.
Next, I have tried to preview the content of the file.
The problem I encountered:
org.pentaho.di.core.exception.KettleException:
Error getting first 100 from file hdfs://****:****@50.31.134.130/user/hadoop/programex.txt
Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2893)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.first(HadoopFileInputDialog.java:2765)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.access$200(HadoopFileInputDialog.java:115)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog$3.handleEvent(HadoopFileInputDialog.java:472)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.open(HadoopFileInputDialog.java:664)
at org.pentaho.di.ui.spoon.delegates.SpoonStepsDelegate.editStep(SpoonStepsDelegate.java:136)
at org.pentaho.di.ui.spoon.Spoon.editStep(Spoon.java:7742)
at org.pentaho.di.ui.spoon.trans.TransGraph.editStep(TransGraph.java:2755)
at org.pentaho.di.ui.spoon.trans.TransGraph.mouseDoubleClick(TransGraph.java:704)
at org.eclipse.swt.widgets.TypedListener.handleEvent(Unknown Source)
at org.eclipse.swt.widgets.EventTable.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Widget.sendEvent(Unknown Source)
at org.eclipse.swt.widgets.Display.runDeferredEvents(Unknown Source)
at org.eclipse.swt.widgets.Display.readAndDispatch(Unknown Source)
at org.pentaho.di.ui.spoon.Spoon.readAndDispatch(Spoon.java:1180)
at org.pentaho.di.ui.spoon.Spoon.start(Spoon.java:6954)
at org.pentaho.di.ui.spoon.Spoon.main(Spoon.java:564)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.pentaho.commons.launcher.Launcher.main(Launcher.java:134)
Caused by: org.pentaho.di.core.exception.KettleFileException:
Exception reading line: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:170)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:94)
at org.pentaho.di.ui.trans.steps.hadoopfileinput.HadoopFileInputDialog.getFirst(HadoopFileInputDialog.java:2882)
... 25 more
Caused by: java.io.IOException: Could not obtain block: blk_-2373914758285898870_3237 file=/user/hadoop/programex.txt
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1812)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1638)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1767)
at java.io.DataInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at org.apache.commons.vfs.util.MonitorInputStream.read(Unknown Source)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at sun.nio.cs.StreamDecoder.read0(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at org.pentaho.di.trans.steps.textfileinput.TextFileInput.getLine(TextFileInput.java:109)
... 27 more
I am working on a single node distribution on Hadoop-0.20.2.... and I am using Kettle 4.3... I have found out from the below link http://wiki.pentaho.com/display/BAD/...adoop+Versions (http://wiki.pentaho.com/display/BAD/Configure+Pentaho+for+Cloudera+and+Other+Hadoop+Versions) that, we need not make any configurations changes as it is launched pre-configured.
Anyways,
The hadoop-0.20.2-core.JAR has come by default in libext/pentaho and the same JAR is present on my Hadoop singlenode.
I have checked in the libext/commons and found the commons-configuration-*.jar is missing and i have placed the latest version of it in the libext/commons.
But even then, The exception is pertaining....:mad:
Please help me in resolving this issue. It would be helpful to me if anyone can describe me the procedure in detail.