Question

给出以下从Python解释器运行的代码：

import sys
sys.getdefaultencoding()
my_string = '\xc3\xa9'
my_string = unicode(my_string, 'utf-8')
my_string
print my_string

在Mac上运行Python 2.6.1，一切正常：

$ python
Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> my_string = '\xc3\xa9'
>>> my_string = unicode(my_string, 'utf-8')
>>> my_string
u'\xe9'
>>> print my_string
é
>>>

在Ubuntu 10.04 LTS上运行Python 2.6.5时，它失败了：

$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> my_string = '\xc3\xa9'
>>> my_string = unicode(my_string, 'utf-8')
>>> my_string
u'\xe9'
>>> print my_string
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
>>>

Python 2.6.1和2.6.5之间有什么变化需要不同的unicode字符串处理吗？或者这与我的（默认的Ubuntu服务器10.04 LTS）linux环境中错误配置的内容有关吗？

编辑：两个环境都有LANG = en_US.UTF-8

Answer 1

C语言环境可能会发生这种情况。尝试使用LANG=en_US.UTF-8 python运行Python并再次尝试使用代码。

Answer 2

我可以使用以下命令重现错误：

$ PYTHONIOENCODING=ascii python -c'print "\xc3\xa9".decode("utf-8")'

Traceback (most recent call last):
  File "", line 1, in 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0:\
ordinal not in range(128)

sys.getdefaultencoding()为'ascii'，默认情况下不太有用。

尝试使用您的控制台编码：

$ PYTHONIOENCODING=utf-8 python -c'print "\xc3\xa9".decode("utf-8")'
é

或

$ python -c'import locale; print "\xc3\xa9".decode("utf-8").encode(
> locale.getpreferredencoding())'
é

检查sys.stdout.encoding：

$ python -c'import sys; o = sys.stdout; print o.isatty(), o.encoding'
True UTF-8

$ python -c'import sys; o = sys.stdout; print o.isatty(), o.encoding' | cat
False None

$ python -c'import sys; o = sys.stdout; print o.isatty(), o.encoding' >/tmp/out
$ cat /tmp/out
False None

如果sys.stdout.encoding为None，请尝试使用locale.getpreferredencoding()或设置PYTHONIOENCODING，如上所示。见http://wiki.python.org/moin/PrintFails

如果仅在交互式Python会话中发生错误，请查看sys.displayhook()。

Answer 3

你试过用你的字符串加前缀吗？

my_string = u'\ xc3 \ xa9'

请参阅http://docs.python.org/howto/unicode.html#unicode-literals-in-python-source-code

在Python源代码中，Unicode 文字被写成字符串以'u'或'U'为前缀性格：u'abcdefghijk'。具体代码点可以使用 \ u转义序列，后面跟着通过四个十六进制数字给出代码点。 \ U转义序列是类似，但预计8位十六进制数字，而不是 4。

Answer 4

Python 3.6.8或更高版本

@jfs回答

$ PYTHONIOENCODING=utf-8 python file.py

为我工作。而且，如果要将其设置为默认值，则可以将以下命令添加到basrc或zshrc

export PYTHONIOENCODING="utf-8"

Python unicode在OSX上的2.6.1中工作，但在Ubuntu上的2.6.5中不工作

4 个答案: