I've been gathering some interesting and useful information when dealing with Pentaho Reporting, Pentaho Metadata and characters not represented in the standard ASCII character set. This bucket of tips will make it into our documentation ASAP, but I thought it prudent to share it with our community even sooner.

IMPORTANT CAVEAT: Note that where I specify UTF-8, I am only doing that as a reference encoding... the encoding I speak of in most cases can represent any extended character set; UTF-8 is a common one for multi-national apps, because it represents multi-national characters.

Character encoding is key to displaying multi-byte or special characters from character sets outside of the standard ASCII character set. Any text-based files that contain special characters in their glyph form must be encoded as at least UTF-8, or in the character encoding for the language you are attempting to display.

The character encoding is significant no matter where these characters reside or travel - if the file or database stores the characters as UTF-8, then Java must handle those characters as UTF-8 and where ever the characters' destination is, be it a browser window or system file, the destination must also render the characters using the same character encoding.

So, this means:

1. Check any and all TXT or CSV files in an appropriate editor to verify that they are encoded in the correct character encoding. In a pinch, Notepad will do, but if you are seriously dealing with localization, it's in your best interest to invest in or download a good unicode text editor.

2. Make sure that your HTML and XML files have a meta tag specifying your chosen as the character set. For example:

<span style="font-weight: bold; font-family: courier new;font-size:85%;" >