我试图理解为什么2
和5
命令会引发错误:
1 >>> s = 'it�s'
2 >>> s.encode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
3 >>> print s.decode('utf-8').encode('utf-8')
it�s
4 >>> bytearray(s.decode('utf-8').encode('utf-8'))
bytearray(b'it\xc3\xaf\xc2\xbf\xc2\xbds')
5 >>> bytearray(s.decode('utf-8').encode('utf-8', 'ignore'), 'utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)
我认为s.decode('utf-8','ignore').encode('utf-8')
正在将s
转换为utf-8
而忽略了奇怪的事情。那么,为什么在bytearray
中转换为utf-8
编码会引发错误?