View Full Version : setting character encoding for MySQL connection

08-02-2006, 02:41 AM

I have problem with character encoding while moving data from source system (Firebird) to destination system (MySQL).

In source system we use character set NONE (no encoding/decoding is performed) and strings stored in this source system are in UTF-8 encoding.

When we move data from source to destination using Kettle, national characters are corrupted. It is probably because default charset encoding in MySQL is latin1.

I tried to change default charset in MySQL server to binary or utf8 but without luck. Maybe its needed to change charset encoding on client side, i.e. somewhere in Kettle (perhaps in connection setting). Is it possible ?

We don't want to make any charset conversions while transforming data and storing them into MySQL, so maybe binary charset is the right for us. But how we can setup this behaviour ?

Is charset conversion performed by Kettle or MySQL while storing data to MySQL ?

Please help!


08-02-2006, 02:47 AM
Nope, you need to give the MySQL JDBC driver a couple of extra options to work with, see here (http://dev.mysql.com/doc/refman/5.0/en/cj-configuration-properties.html) under "Miscellaneous".
You can do this in the Options tab of your database connection.

There also have been a couple of questions about this on this very forum.



08-02-2006, 09:04 AM
Thank you Matt, you helped me a bit. But I still have a problem. I need handle binary data without any character set in MySQL. I have used datatype VARBINARY for storing such data. But mysql still converts data to latin1 (or other encoding set in JDBC driver) when storing to VARBINARY.

When I set characterEncoding=utf8 in JDBC driver configuration then UTF-8 data are handled correctly. But other (binary) data caused data truncation exception even if they should be stored to VARBINARY type.

MySQL 5.022 have charset encoding "binary" (try 'show character set'). When I set characterEncoding=binary I cannot connect to database, I get JDBC driver error:
Unknown character set: 'usa7'

Any idea how to handle binary data without any charset conversions ?