Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: UTF-8 Importing French text

  1. #1
    Join Date
    Apr 2015
    Posts
    5

    Default UTF-8 Importing French text

    Hello

    I use the "CSV input" step to import some CSV written in french.
    I selected UTF-8 in the "File encoding" option, however I have some issue when importing words including some "é", "è" or "à"...

    When I preview the data this is what can see small "?" instead of the char:


    Then when imported in a MySQL database, the char has disappeared.

    Do you know wich "File encoding" to choose for French ?

    Thank you !
    Attached Images Attached Images  

  2. #2
    Join Date
    Aug 2011
    Posts
    360

    Default

    Hi,

    you need to know what is the encoding of the file, for example open it with
    Notepad++, I think it would say ANSI, which means CP1252 or Windows-1252 (i suppose you're on windows).
    If you're on windows french, you should use ISO-8859-15, it should work fine.

    But the question is not really the language, but which encoding was used to write your input file.

    PS: je parle français si vous préfèrez!

  3. #3
    Join Date
    Aug 2015
    Posts
    313

    Default

    Quote Originally Posted by Mathias.CH View Post
    Hi,

    you need to know what is the encoding of the file, for example open it with
    Notepad++, I think it would say ANSI, which means CP1252 or Windows-1252 (i suppose you're on windows).
    If you're on windows french, you should use ISO-8859-15, it should work fine.
    I am also facing same issue and using MySQL database,java 1.7,Windows OS, pdi-ce-6.0. when i am trying to load data from DATABASE to DATABASE(i.e. Table input -> Table Output) then also Chinese data is not populating properly.
    when i use Table input -> text file output and text file input -> table output and when i mentioned utf8 as encoding then data loaded correctly.
    Still, I am trying to understand why it is not loading correct data when i have utf8 settings same available for table and column level in my SOURCE and TARGET database.

    Quote Originally Posted by Mathias.CH View Post
    But the question is not really the language, but which encoding was used to write your input file.
    if you observe his notes, he mentioned "I selected UTF-8 in the "File encoding" option". please correct me still i understood wrongly.

    Last edited by santhi; 12-08-2015 at 04:53 AM.

  4. #4
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by santhi View Post

    Quote Originally Posted by Mathias.CH View Post
    But the question is not really the language, but which encoding was used to write your input file.
    if you observe his notes, he mentioned "I selected UTF-8 in the "File encoding" option". please correct me still i understood wrongly.

    Please note that Mathias says that it matters what encoding was used to *WRITE* the file.
    If the file was written in ISO-8859-15, it won't read correctly in UTF8. If a file is written in UTF16, it won't read correctly in UTF8.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.