Hitachi Vantara Pentaho Community Forums
Results 1 to 8 of 8

Thread: assign hex code of character to Variable or Parameter

  1. #1

    Default assign hex code of character to Variable or Parameter

    Hi,

    I have a problem with the standard field seperator used in most of my source files. The seperator is the 'thorn' or ' þ '. The advantage of using the 'thorn' is that it is not/rarely seen in the data itself.

    The disadvantage is that every time I enter it 'hardcoded' in a transformation which is read by a job for the field seperator or even a variable the job crashes saying:

    "Caused by: org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence."

    I can't change the field seperator anayway so I have to solve this differently. I would like to set the value in the kettle.properties by it hex UTF-8 value which is '00FE' so my kettle client on windows and the PDI server on Linux can interpret the field seperator the same.

    Can you set this value in kettle.properties and what would be the syntax for it?

  2. #2
    Join Date
    Apr 2009
    Posts
    12

    Default

    Anybody come up with the answer to this. Having the same issue.

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    You can use format $[00FE] or if you have a bunch of values piled up: $[31,32,33,34,35]

  4. #4
    Join Date
    Apr 2009
    Posts
    12

    Default

    For clarity - we should set '00FE' in the kettle.properties and then set the Delimiter to $[00FE] in the text file input step? We'll give it a shot and report back.

  5. #5
    Join Date
    Apr 2009
    Posts
    12

    Default

    That didn't work. We're trying to set HEX value for the thorn character to be used as a Delimiter in a text file input step. We're using postgres DB and when we parse the files with the thorn character on Window it works, but on Linux is does not.

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    So you need a recursive variable say ${DELIMITER} that returns $[00EF] and is recursively converted to 0x0FE? Sounds like a good JIRA feature request.

  7. #7
    Join Date
    Apr 2009
    Posts
    12

    Default

    darn - that's not what I wanted to hear. Thanks for the help

  8. #8
    Join Date
    Apr 2009
    Posts
    12

    Default

    Just in case someone else stumbles upon this. We did find a solution to the issue of the 'thorn' or ' þ ' character that does not require use of a variable for the delimiter/separator.

    First - built into the text file input step is the ability to insert a unicode character. Right click on the delimiter/separator and you will see it in the list of action "Insert unicode character..." However, you will not find the thorn character here. Try this one �.
    Second - with postgres DB make sure the set the encoding to UTF-8
    Third and probably not required as it may be specific to the file your working with - set format to mixed

    Works for us. Hope it helps.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.