使用utf-8编码/解码错误

时间:2012-02-27 20:17:19

标签: python unicode utf-8 ascii

何我会正确编码以下内容:

# # -*- coding: utf-8 -*-

>>> 'What\x80\x99s Up: Balloon to the Rescue!'.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)
>>> 'What\x80\x99s Up: Balloon to the Rescue!'.decode('utf-8')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
 File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 4: invalid start byte

2 个答案:

答案 0 :(得分:3)

这里有两个问题。首先,您的UTF-8字节序列是错误的;它应该是\xe2\x80\x99。您也使用了错误的功能;你需要从UTF-8 解码

>>> print 'What\xe2\x80\x99s Up: Balloon to the Rescue!'.decode('utf-8')
What’s Up: Balloon to the Rescue!

答案 1 :(得分:0)

>>> type('What\x80\x99s Up: Balloon to the Rescue!')
<type 'str'>

所以你不能编码它,因为它不是Unicode。

您的Unicode输入是什么?