the blogger archives (2007-2008)

Yeah! UTF-8 and Unicode in Py3k!

Published: 2007-07-22T23:12:00.000+02:00
Tags: utf-8, py3k, unicode, python

From what I just read on GvR's Python 3000 Status Update, it seems like loads of problems that crop up when using non-ASCII characters in string literals will be finally gone in Python 3.0. While developing the new codebase, this has been a problem, as everything (database + code + HTML output) is UTF-8 encoded, and if you are not careful enough, Python 2.4 will bite you with an exception (string literals containing umlauts, etc..).

Having UTF-8 as the default source encodings will make things easier for code with non-ASCII string literals. That's one of the quirks I dislike about the current Python. I'm glad this problem is taken care of in Python 3.0.

Thomas Perl (m at thp io); jabber: