Question

我希望对项目中的所有字符串使用unicode而不是str。我正在尝试使用str.encode方法，但无法从文档中了解encode方法确实或期望作为输入的内容。

希腊小写字母pi是U + 03C0，当以UTF-8编码时是0xCF 0x80。我得到以下内容：

>>> s1 = '\xcf\x80'
>>> s1.encode('utf-8','ignore')

Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
    s1.encode('utf-8','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

我试过它：

>>> s2='\x03\xc0'

>>> s2.encode('utf-8','ignore')

Traceback (most recent call last):
  File "<pyshell#62>", line 1, in <module>
    s2.encode('utf-8','ignore')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 1: ordinal not in range(128)

encode期望什么作为输入，以及为什么忽略＆＃39;选项不会忽略错误？我试过替换＆＃39;而这也没有掩盖错误。

Answer 1

在Python 2.x中，str是一个字节字符串（已编码）。您可以将其解码为unicode对象：

>>> s1 = '\xcf\x80'  # string literal (str)
>>> s1.decode('utf-8')
u'\u03c0'

对于unicode对象，您可以执行编码：

>>> u1 = u'\u03c0'  # unicode literal (unicode)  U+03C0
>>> u1.encode('utf-8')
'\xcf\x80'

str.encode期望输入什么？

1 个答案: