Question

我正在尝试理解python中的编码内容，我想我几乎已经理解了它。所以这里有一些代码我会解释，我希望你验证我的想法:)）

text = line.decode( encoding )
print "type(text) = %s" % type(text)
iso_8859_1 = text.encode('latin1')
print "type(iso_8859_1) = %s" % type(iso_8859_1)
unicodeStr = text.encode('utf-8')
print "type(unicodeStr) = %s" % type(unicodeStr)

所以第一行

text = line.decode( encoding )

将编码“encoding”中给定的给定字符串转换为python的unicode文本格式。因此输出

type(text) = <type 'unicode'>

现在，我使用utf-8编码风格的文件中的原始文本，而对于我的其余代码，“text”是一个utf-8文本。

现在我想将utf-8文本转换为（出于什么原因）其他一些内容，例如latin1由“text.encode（'latin1'）”完成。在这种情况下，我的代码输出是

type(iso_8859_1) = <type 'str'>
type(unicodeStr) = <type 'str'>

现在，唯一仍然存在的问题是：为什么后两种情况中的类型为'str'而不是'latin1'或'unicode'。这对我来说还不清楚。

后面的字符串“iso_8859_1”和“unicodeStr”是否未以“latin1”或“unicode”的形式进行反复编码？

Answer 1

首先，utf8！= unicode。
str基本上是一个字节序列，编码是解释这些序列的方法，而unicode就是unicode。乔尔在这个主题上发表了很好的帖子http://www.joelonsoftware.com/articles/Unicode.html

Python：有关编码的问题

1 个答案: