Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Decode utf-8 string from XML document

  1. #1
    Join Date
    Dec 2017
    Posts
    2

    Default Decode utf-8 string from XML document

    Hello,
    I managed to successfully call SOAP WS using HTTP Post step and parse the response with Get XML Data.
    The only problem I got is that some fields contain UTF-8 encoded special characters (like [ł] which is equivalent of [ł] or [ń] -> [ń]).
    What should I do to decode these values?

    I will appreciate your help.

    Regards,
    Rob

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Wow... Someone was really paranoid...

    Run the strings through Calculator
    Code:
    New field    Calculation    Field A    Field B    Field C    Value type    Length    Precision    Remove    Conversion mask    Decimal symbol    Grouping symbol    Currency symbol
    MidStr    Unescape XML content    InStr            String            N
    Where Field A (InStr) is the field you want to fix.
    That will change it from ł to & #322; (Added space because forum auto-unescapes)
    Now run it through the same calculator process again...
    Code:
    New field    Calculation    Field A    Field B    Field C    Value type    Length    Precision    Remove    Conversion mask    Decimal symbol    Grouping symbol    Currency symbol
    MidStr    Unescape XML content    InStr            String            N                
    OutStr    Unescape XML content    MidStr            String            N
    Your the OutStr should now contain the characters you're looking for.

    Last edited by gutlez; 12-15-2017 at 12:23 PM.

  3. #3
    Join Date
    Dec 2017
    Posts
    2

    Default

    Many thanks gutlez!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.