Character encoding is a PITA

I don’t know if it’s just us, but we seem to periodically run into the same troubles concerning character sets in Java web apps. This time I got bitten once again by the bug that causes international characters entered into an HTML form field to come out as an unintelligible string of random symbols when displayed on another page.

Some time ago, I came to the conclusion that if we used UTF-8 everywhere, we’d have had less problems. And, indeed, by encoding all files in UTF-8, setting container and form encoding to UTF-8 in web.xml, ensuring all pipelines are serving UTF-8 streams, putting relevant <meta> tags in the HTML output, specifying UTF-8 as the charset in JDBC connection properties and having an UTF-8 encoded database, when a problem comes up it is a safe bet that someone, somewhere forgot one of these precautions and fixing it is usually just a matter of finding where.

Until today, at least. Today I spent a few hours fighting with this kind of bug and painstakingly reviewing everything in order to make sure that UTF-8 was specified everywhere, to no avail. At last, I gave up and set page encoding to ISO-8859-1 and everything is now hunky dory again.

Sometimes I wish all the world stopped using national character sets and everything was encoded as 7-bit ASCII once again. 127 characters should be enough!

3 Responses to “Character encoding is a PITA”


  • Hey, go and tell that to chinese or japanese people ;-)

  • With all the due respect, what do you call “national characters”? Non anglo-saxon characters? Is the “w” a national character? And what about the “k”? And “$”? Until the age of 14, I was told the “w” was not a valid character, but the “ñ” was.

    The problem is that all this things started in english speaking countries. I think the problem gets solved with a proper use of UNICODE.

  • Yeap! I too am constantly running into problems because of this bloody thing. But, curiously enough, I never had any trouble until the big boss decided I had to use UTF-8. I’ve been programming for over 10 years now and ISO-8859-1 was always a perfect choice!

    By the way, here are some of my national characters: ã,õ,â,ê,ô,ç

Leave a Reply