Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Issues with Character encoding

  1. #1

    Unhappy Issues with Character encoding

    Good Afternoon All,

    Starting to get a headache with this - been playing around with it for ever!

    We are in the process of replacing a legacy ETL program written in PHP with the Pentaho Kettle Data Integration (Spoon) program.

    I am attempting to migrate data from a table defined as:

    CREATE TABLE `Languages` (
    `Language` varchar(40) NOT NULL DEFAULT '',
    `ShortCode` varchar(8) NOT NULL DEFAULT '',
    `LanguageName` varchar(40) NOT NULL DEFAULT '',
    PRIMARY KEY (`Language`),
    KEY `ShortCode_Language` (`ShortCode`,`Language`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
    ;

    To a table defined as:

    CREATE TABLE `stg_hr_language` (
    `idLanguage` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
    `LanguageCode` varchar(8) NOT NULL DEFAULT '',
    `LanguageName` varchar(40) NOT NULL DEFAULT 'Not Defined',
    `AltLanguageName` varchar(40) NOT NULL DEFAULT 'Not Defined',
    PRIMARY KEY (`idLanguage`),
    KEY `Code_Language` (`LanguageCode`,`LanguageName`)
    ) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8
    ;

    I have a Transformation job with 2 objects on it. A Table Input defined as:

    SELECT case when ifnull(Language, '') = '' then 'Not Defined' else convert(convert(Language using binary) using utf8) end Language,
    ShortCode,
    case when ifnull(LanguageName, '') = '' then 'Not Defined' else convert(convert(LanguageName using binary) using utf8) end LanguageName
    FROM hostel_rob.Languages

    and an Insert / Update to map:

    `stg_hr_language`. idLanguage (not mapped as it auto increments)
    `stg_hr_language`. LanguageCode = `Languages`. ShortCode (lookup key value)
    `stg_hr_language`. LanguageName = `Languages`. Language
    `stg_hr_language`. AltLanguageName = `Languages`. LanguageName

    The last three items are all configured as Update = 'Y'.

    The job completes without any errors, however, when i then try to do a select 8 from stg_hr_language all the Chinese Characters appear as ? rather than the correct character.

    When i perform this migration directly through MySQL this works, so can only assume that there is some Character Encoding happening somewhere that i am not aware of.

    Can someone please give me some hwlp with this, as all the source tables use latin1 and all the target tables need to use utf8 so i really need to get this working...

    Thanks in advance,

    Scott

  2. #2

    Default

    For the time being i have been forced to move on to Talend, as it does maintain the character encoding. It's a shame as i much prefer the Pentaho tool and find it much easier to use.

    Support guys - if you're seeing this - please give me some guidance here or you'll lose a customer!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.