Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Loading the character "š" from Oracle database produces the character "?" as a result

  1. #1
    Join Date
    Dec 2011
    Posts
    15

    Default Loading the character "š" from Oracle database produces the character "?" as a result

    Hello,

    we have an Oracle database:
    SELECT * FROM NLS_DATABASE_PARAMETERS;
    NLS_LANGUAGE AMERICAN
    NLS_TERRITORY AMERICA
    NLS_CURRENCY $
    NLS_ISO_CURRENCY AMERICA
    NLS_NUMERIC_CHARACTERS .,
    NLS_CHARACTERSET WE8MSWIN1252
    NLS_CALENDAR GREGORIAN
    NLS_DATE_FORMAT DD-MON-RR
    NLS_DATE_LANGUAGE AMERICAN
    NLS_SORT BINARY
    NLS_TIME_FORMAT HH.MI.SSXFF AM
    NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
    NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
    NLS_DUAL_CURRENCY $
    NLS_COMP BINARY
    NLS_LENGTH_SEMANTICS BYTE
    NLS_NCHAR_CONV_EXCP FALSE
    NLS_NCHAR_CHARACTERSET AL16UTF16
    NLS_RDBMS_VERSION 11.2.0.3.0

    And there is a character "š" used in some fields. For example:
    PERSON_NAME
    Šurik
    Ašurik

    If we query the field from Toad, then it looks ok. Also from Squirrel.

    If i query it from PentahoDataIntegator TableInput step over JDBC driver the character does not show at all:
    PERSON_NAME
    urik
    Aurik
    If i query it from PentahoDataIntegrator TableInput step over ODBC driver then the character is replaced by ?:
    PERSON_NAME
    ?urik
    A?urik

    If i create a database link from our other database and query it from the other database over the databaselink, then the character loaded is correct "š":
    PERSON_NAME
    Šurik
    Ašurik
    The characterset and other parameters of the other database is:
    SELECT * FROM NLS_DATABASE_PARAMETERS;
    NLS_LANGUAGE AMERICAN
    NLS_TERRITORY AMERICA
    NLS_CURRENCY $
    NLS_ISO_CURRENCY AMERICA
    NLS_NUMERIC_CHARACTERS .,
    NLS_CHARACTERSET AL32UTF8
    NLS_CALENDAR GREGORIAN
    NLS_DATE_FORMAT DD-MON-RR
    NLS_DATE_LANGUAGE AMERICAN
    NLS_SORT BINARY
    NLS_TIME_FORMAT HH.MI.SSXFF AM
    NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
    NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
    NLS_DUAL_CURRENCY $
    NLS_COMP BINARY
    NLS_LENGTH_SEMANTICS BYTE
    NLS_NCHAR_CONV_EXCP FALSE
    NLS_NCHAR_CHARACTERSET UTF8
    NLS_RDBMS_VERSION 11.2.0.2.0

    So it seems there is somekind of characterset issue here. How could i set up PDI to also load the correct "š" character when directly conncting to the first database?

    Br,
    pxr

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Do you have the same problem with the Euro currency sign (€) ?
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by pxr View Post
    Hello,

    we have an Oracle database:
    SELECT * FROM NLS_DATABASE_PARAMETERS;
    NLS_CHARACTERSET WE8MSWIN1252
    NLS_NCHAR_CHARACTERSET AL16UTF16

    The characterset and other parameters of the other database is:
    SELECT * FROM NLS_DATABASE_PARAMETERS;
    NLS_CHARACTERSET AL32UTF8
    NLS_NCHAR_CHARACTERSET UTF8

    So it seems there is somekind of characterset issue here. How could i set up PDI to also load the correct "š" character when directly conncting to the first database?

    Br,
    pxr
    Yes, you do have characterset issues.
    I don't know that PDI can help you sort them out - if you connect to the database, and put data in with one characterset, while telling the database you are putting in a different character set, you have to replicate that mis-match to pull it out cleanly.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

  4. #4
    Join Date
    Dec 2011
    Posts
    15

    Default

    Quote Originally Posted by marabu View Post
    Do you have the same problem with the Euro currency sign (€) ?
    Hi marabu,

    Over JDBC we have the same problem the Euro sign disappears while loading from the WE8MSWIN1252 characterset database:
    The original string is "TEST_STRING€" but the result over JDBC is "TEST_STRING"

    Over ODBC driver the euro sign is loaded correctly.

    Br,
    Jaanus

  5. #5
    Join Date
    Dec 2011
    Posts
    15

    Default

    Quote Originally Posted by gutlez View Post
    if you connect to the database, and put data in with one characterset, while telling the database you are putting in a different character set, you have to replicate that mis-match to pull it out cleanly.
    Hi gutlez,

    what do you mean by "putting in a different character set"? The data is already in the source database with WE8MSWIN1252 characterset and there is no problem selecting the string "š" from it with databasebrowsers. But when loading it with PDI JDBC driver to a text file the character is lost. The other database was mentioned only because from it the "š" can be loaded to text file correctly and also loading the "š" from the WE8MSWIN1252 characterset over a db-link created in the AL32UTF8 database works correctly.

    Br,
    Jaanus

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by pxr View Post
    Over JDBC we have the same problem the Euro sign disappears while loading from the WE8MSWIN1252 characterset database
    There is no place for Oracle RDB in my life, but I recall ojdbc to come with built-in support for only a small number of character sets, ASCII and ISO Latin 1 being among them, but not MS1252. That would explain why characters between 0x80 and 0x9F get lost.

    FWIW: NLS Support and Object Types
    Last edited by marabu; 06-25-2013 at 03:33 AM. Reason: found link
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by pxr View Post
    what do you mean by "putting in a different character set"?
    I mean that I have seen people put things like "š" in an ASCII DB. This is not natively supported by the DB, but when they enter it and retrieve from the same tools, it works. When you access the DB directly, because that letter is not natively supported, it breaks.

    Since the other DB is directly coded to UTF8, it works correctly - it stores the character the way it should be stored, and *ALL* DB users can see the data, not just the tool that entered it.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.