PDA

View Full Version : Pentaho Map Reduce - Invalid byte 2 of 4-byte UTF-8 sequence Error



iShotAlex
01-29-2013, 11:54 AM
Hello,

I would really appreciate some help with this issue.

I'm getting the following error when running a Pentaho Map/Reduce job:


java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.

The problem seems to reside in the reducer transformation (which is quite complex) - If I replace it with a simpler reducer it seems to work fine. If I run the transformation independently it also run fine.

Please note that the error occurs the Map Reduce job starts. Here's the complete log, will post any other info as needed

Thanks in advance for any input!!

Alex


2013/01/29 16:42:42 - Spoon - Starting job...
2013/01/29 16:42:42 - SFW_MapReduce - Start of job execution
2013/01/29 16:42:42 - SFW_MapReduce - exec(0, 0, START.0)
2013/01/29 16:42:42 - START - Starting job entry
2013/01/29 16:42:42 - SFW_MapReduce - Starting entry [SFW_MapReduce]
2013/01/29 16:42:42 - SFW_MapReduce - exec(1, 0, SFW_MapReduce.0)
2013/01/29 16:42:42 - SFW_MapReduce - Starting job entry
2013/01/29 16:42:43 - Transformation metadata - The shared object fie [null] is empty!
2013/01/29 16:42:48 - Transformation metadata - The shared object fie [null] is empty!
2013/01/29 16:42:53 - SFW_MapReduce - Cleaning output path: hdfs://xxx/user/bio/SFW_KPI
2013/01/29 16:42:53 - SFW_MapReduce - Using Kettle installation from /user/bio/4.4.0-1.3.0-cdh3u4
2013/01/29 16:42:53 - SFW_MapReduce - Configuring Pentaho MapReduce job to use Kettle installation from /user/bio/4.4.0-1.3.0-cdh3u4
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
Caused by: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1387)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1261)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1192)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:415)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1957)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:386)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:414)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
... 10 more
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1313)
... 17 more


2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
Caused by: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1387)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1261)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1192)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:415)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1957)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:386)
at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:414)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
... 10 more
Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1313)
... 17 more


2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.ipc.Client.call(Client.java:1107)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.mapred.$Proxy19.submitJob(Unknown Source)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:910)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at java.security.AccessController.doPrivileged(Native Method)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at javax.security.auth.Subject.doAs(Unknown Source)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.hadoop.shim.common.CommonHadoopShim.submitJob(CommonHadoopShim.java:201)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:806)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:589)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:728)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.execute(Job.java:443)
2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : at org.pentaho.di.job.Job.run(Job.java:363)
2013/01/29 16:42:54 - SFW_MapReduce - Finished job entry [SFW_MapReduce] (result=[false])
2013/01/29 16:42:54 - SFW_MapReduce - Job execution finished
2013/01/29 16:42:54 - Spoon - Job has ended.

iShotAlex
01-29-2013, 01:00 PM
Hello,

Resolved: I'm using a Spanish localized version of Kettle: the JOIN step gets translated into "Unión por Clave" - Everything works fine when running the transformation on a windows server but not when running it as a reducer (Hadoop cluster is linux) - removed the "ó" character (which is not UTF-8) and everything ran flawlessly.

Non-English speakers should be careful with the name they give to the steps and with what they write in the notes.....

Cheers,

Alex