Question

打印结果没有区别，utf-8的编码和解码用途是什么？它是编码（＆＃39; utf8＆＃39;）还是编码（＆＃39; utf-8＆＃39;）？

u ='abc'
print(u)
u=u.encode('utf-8')
print(u)
uu = u.decode('utf-8')
print(uu)

Answer 1

str.encode将字符串（或unicode字符串）编码为一系列字节。在Python 3中，这是一个bytearray，在Python 2中它再次str（令人困惑）。当您对unicode字符串进行编码时，您将留下字节，不 unicode - 请记住 UTF-8不是unicode ，它是一种可以转向的编码方法unicode codepoints转换为字节。

str.decode将使用所选编解码器解码序列化字节流，选择正确的unicode代码点并为您提供unicode字符串。

所以，你在Python 2中所做的是：'abc'＆gt; 'abc'＆gt; u'abc'，在Python 3中是： 'abc'＆gt; b'abc'＆gt; 'abc'。尝试打印repr(u)或type(u)以及查看更改位置的内容。

utf_8 might be the most canonical，但这并不重要。

Answer 2

通常Python会首先尝试将其解码为unicode，然后才能将其编码回UTF-8。其中包含的内容与可应用于8位字符串的字符集无关

例如

data = u'\u00c3'            # Unicode data
 data = data.encode('utf8')
 print data

'\xc3\x83' //the output.

请查看here和here。这会有所帮助。

对特定字符集进行编码和解码

2 个答案: