Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Pentaho Map Reduce - Invalid byte 2 of 4-byte UTF-8 sequence Error

  1. #1
    Join Date
    Nov 2010
    Posts
    16

    Default Pentaho Map Reduce - Invalid byte 2 of 4-byte UTF-8 sequence Error

    Hello,

    I would really appreciate some help with this issue.

    I'm getting the following error when running a Pentaho Map/Reduce job:

    java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
    The problem seems to reside in the reducer transformation (which is quite complex) - If I replace it with a simpler reducer it seems to work fine. If I run the transformation independently it also run fine.

    Please note that the error occurs the Map Reduce job starts. Here's the complete log, will post any other info as needed

    Thanks in advance for any input!!

    Alex

    Code:
    2013/01/29 16:42:42 - Spoon - Starting job...
    2013/01/29 16:42:42 - SFW_MapReduce - Start of job execution
    2013/01/29 16:42:42 - SFW_MapReduce - exec(0, 0, START.0)
    2013/01/29 16:42:42 - START - Starting job entry
    2013/01/29 16:42:42 - SFW_MapReduce - Starting entry [SFW_MapReduce]
    2013/01/29 16:42:42 - SFW_MapReduce - exec(1, 0, SFW_MapReduce.0)
    2013/01/29 16:42:42 - SFW_MapReduce - Starting job entry
    2013/01/29 16:42:43 - Transformation metadata - The shared object fie [null] is empty!
    2013/01/29 16:42:48 - Transformation metadata - The shared object fie [null] is empty!
    2013/01/29 16:42:53 - SFW_MapReduce - Cleaning output path: hdfs://xxx/user/bio/SFW_KPI
    2013/01/29 16:42:53 - SFW_MapReduce - Using Kettle installation from /user/bio/4.4.0-1.3.0-cdh3u4
    2013/01/29 16:42:53 - SFW_MapReduce - Configuring Pentaho MapReduce job to use Kettle installation from /user/bio/4.4.0-1.3.0-cdh3u4
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
        at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    Caused by: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1387)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1261)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1192)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:415)
        at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1957)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:386)
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:414)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
        ... 10 more
    Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1313)
        ... 17 more
    
    
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3766)
        at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    Caused by: java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1387)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1261)
        at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1192)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:415)
        at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1957)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:386)
        at org.apache.hadoop.mapred.JobInProgress.<init>(JobInProgress.java:414)
        at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3764)
        ... 10 more
    Caused by: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:470)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
        at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
        at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1313)
        ... 17 more
    
    
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.mapred.$Proxy19.submitJob(Unknown Source)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:910)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at java.security.AccessController.doPrivileged(Native Method)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at javax.security.auth.Subject.doAs(Unknown Source)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.hadoop.shim.common.CommonHadoopShim.submitJob(CommonHadoopShim.java:201)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.di.job.entries.hadooptransjobexecutor.JobEntryHadoopTransJobExecutor.execute(JobEntryHadoopTransJobExecutor.java:806)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.di.job.Job.execute(Job.java:589)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.di.job.Job.execute(Job.java:728)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.di.job.Job.execute(Job.java:443)
    2013/01/29 16:42:54 - SFW_MapReduce - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) :     at org.pentaho.di.job.Job.run(Job.java:363)
    2013/01/29 16:42:54 - SFW_MapReduce - Finished job entry [SFW_MapReduce] (result=[false])
    2013/01/29 16:42:54 - SFW_MapReduce - Job execution finished
    2013/01/29 16:42:54 - Spoon - Job has ended.

  2. #2
    Join Date
    Nov 2010
    Posts
    16

    Default

    Hello,

    Resolved: I'm using a Spanish localized version of Kettle: the JOIN step gets translated into "Unión por Clave" - Everything works fine when running the transformation on a windows server but not when running it as a reducer (Hadoop cluster is linux) - removed the "ó" character (which is not UTF-8) and everything ran flawlessly.

    Non-English speakers should be careful with the name they give to the steps and with what they write in the notes.....

    Cheers,

    Alex

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.