更长的答案

Question

我有一个包含特殊字符的列表（例如é或空格），当我打印列表时，这些字符用Unicode代码打印，如果我单独打印列表元素则打印正确：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']
print(my_list)
print(my_list[0])
print(my_list[1])

此代码的输出是

['\xc3\xa9l\xc3\xa9phant', 'Hello World']

éléphant

Hello World

我希望第一次输出['éléphant', 'Hello World']。我应该改变什么？

Answer 1

如果可能，请切换到Python 3，您将获得预期的结果。

如果你必须在Python 2中使用它，那么使用unicode字符串：

my_list = [u'éléphant', u'Hello World']

现在的方式，Python将第一个字符串解释为一系列值为'\xc3\xa9l\xc3\xa9phant'的字节，只有在正确UTF-8解码后才会转换为Unicode代码点：'\xc3\xa9l\xc3\xa9phant'.decode('utf8') == u'\xe9l\xe9phant'

如果您希望打印列表repr并获取＆＃34; unicode＆＃34; out，你必须手动将其编码为UTF-8（如果你的终端了解的话）。

>>> print repr(my_list).decode('unicode-escape').encode('utf8')
[u'éléphant', u'Hello World']

但手动格式化更容易：

>>> print ", ".join(my_list)
éléphant, Hello World

Answer 2

简短的回答，如果你想以这种格式保存输出，你必须自己实现它：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']

def print_list (l):
    print ("[" + ", ".join(["'%s'" % str(x) for x in l]) + "]")

print_list (my_list)

生成预期的

['éléphant', 'Hello World']

但请注意，它会将所有元素放在引号内（例如偶数），因此如果您希望列表中的字符串以外的其他内容，则可能需要更复杂的实现。

更长的答案

问题在于Python在打印之前运行str(my_list)。反过来，它会在每个列表元素上运行repr()。

现在，字符串上的repr()返回字符串的仅ASCII表示形式。那就是那些＆＃39; \ xc3＆＃39;你看到的是一个真正的反斜杠，一个真实的反弹＆＃39; c＆＃39;和一个实际的＆＃39; 3＆＃39;字符。

你可以解决这个问题，因为问题在于list.__str__ ()的实施。

下面是一个演示该程序的示例程序。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# vi: ai sts=4 sw=4 et

import pprint

my_list = ['éléphant', 'Hello World']

# under the hood, python first runs str(my_list), before printing it
my_list_as_string = str(my_list)

# str() on a list runs repr() on each of the elements.
# However, it seems that __repr__ on a string transforms it to an 
# ASCII-only representation
print ('str(my_list) = %s' % str(my_list))
for c in my_list_as_string:
    print c
print ('len(str(my_list)) = %s' % len(str(my_list)))
print ("\n")

# Which we can confirm here, where we can see that it it also adds the quotes:
print ('repr("é") == %s' % repr("é"))
for c in repr("é"):
    print c
print ('len(repr("é")) == %s' % len(repr("é")))
print ("\n")

# Even pprint fails
print ("pprint gives the same results")
pprint.pprint(my_list)

# It's useless to try to encode it, since all data is ASCII
print "Trying to encode"
print (my_list_as_string.encode ("utf8"))

产生这个：

str(my_list) = ['\xc3\xa9l\xc3\xa9phant', 'Hello World']
[
'
\
x
c
3
\
x
a
9
l
\
x
c
3
\
x
a
9
p
h
a
n
t
'
,

'
H
e
l
l
o

W
o
r
l
d
'
]
len(str(my_list)) = 41


repr("é") == '\xc3\xa9'
'
\
x
c
3
\
x
a
9
'
len(repr("é")) == 10


pprint gives the same results
['\xc3\xa9l\xc3\xa9phant', 'Hello World']
Trying to encode
['\xc3\xa9l\xc3\xa9phant', 'Hello World']

在Python中列出列表中的特殊字符

2 个答案:

更长的答案