Question

可能我完全不理解它，所以你能看看代码示例并告诉我应该怎么做，以确保它能够正常工作吗？

我在Eclipse中用Pydev尝试过它。我使用python 2.6.6（因为某些不支持python 2.7的库）。

首先，不使用编解码器模块

# -*- coding: utf-8 -*-

file1 = open("samoloty1.txt", "w")
file2 = open("samoloty2.txt", "w")
file3 = open("samoloty3.txt", "w")
file4 = open("samoloty4.txt", "w")
file5 = open("samoloty5.txt", "w")
file6 = open("samoloty6.txt", "w")

# I know that this is weird, but it shows that whatever i do, it not ruin anything...
print u"ą✈✈"
file1.write(u"ą✈✈")
print "ą✈✈"
file2.write("ą✈✈")

print "ą✈✈".decode("utf-8")
file3.write("ą✈✈".decode("utf-8"))
print "ą✈✈".encode("utf-8")
file4.write("ą✈✈".encode("utf-8"))

print u"ą✈✈".decode("utf-8")
file5.write(u"ą✈✈".decode("utf-8"))
print u"ą✈✈".encode("utf-8")
file6.write(u"ą✈✈".encode("utf-8"))

file1.close()
file2.close()
file3.close()
file4.close()
file5.close()
file6.close()

file1 = open("samoloty1.txt", "r")
file2 = open("samoloty2.txt", "r")
file3 = open("samoloty3.txt", "r")
file4 = open("samoloty4.txt", "r")
file5 = open("samoloty5.txt", "r")
file6 = open("samoloty6.txt", "r")

print file1.read()
print file2.read()
print file3.read()
print file4.read()
print file5.read()
print file6.read()

每个印刷品都能正常工作，我没有任何有趣的角色。

我也试过这个：我删除了上一次测试中生成的所有文件，只更改了那些行：

file1 = open("samoloty1.txt", "w")

到那些：

file1 = codecs.open("samoloty1.txt", "w", encoding='utf-8')

再次一切正常......

任何人都可以举一些有用的例子吗？

这应该是单独的问题吗？ 我正在下载网页，通过这个：

content = urllib.urlopen(some_url).read()
ucontent = unicode(content, encoding) # i get encoding from headers

这是正确的吗？我应该怎么做才能将它存储在utf-8文件中？（我问它，因为无论我以前做过什么，它都有效......）

** 更新 **

可能一切正常，因为PyDev（或只是Eclipse）的终端编码为UTF-8。所以对于测试我使用Windows 7中的cmd，我得到一些错误。现在一切都按预期崩溃了。：D我在这里展示我改变了什么以使其再次运行（所有这些变化对我来说都是合理的，他们同意我在答案和Python文档中的文档中学到的东西）。

print u"ą✈✈".encode("utf-8") # added encode
file1.write(u"ą✈✈".encode("utf-8")) # added encode
print "ą✈✈"
file2.write("ą✈✈")

print "ą✈✈" # removed .decode("utf-8")
file3.write("ą✈✈") # removed .decode("utf-8"))
print "ą✈✈" # removed .encode("utf-8")
file4.write("ą✈✈") # removed .encode("utf-8"))

print u"ą✈✈".encode("utf-8") # changed from .decode("utf-8")
file5.write(u"ą✈✈".encode("utf-8")) # changed from .decode("utf-8")
print u"ą✈✈".encode("utf-8")
file6.write(u"ą✈✈".encode("utf-8"))

就像有人说的那样，当我使用编解码器时，我不需要在写入文件之前每次都使用encode（）。 :) 问题是，哪个答案应该标记为正确？

Answer 1

您很幸运，默认情况下，您的控制台编码为utf-8。

如果将unicode对象传递给文件对象（write）的sys.stdout方法方法，则会使用其encoding属性隐式解码该对象。

在Windows工作的人不是那么幸运：How to workaround Python "WindowsError messages are not properly encoded" problem?

Answer 2

代码片段中的所有写作练习实际上归结为两种情况：

将字符串写入文件
当您尝试将unicode字符串写入文件

将调用字符串作为s，将unicode字符串作为u。

然后fileN.write（s）有意义，而fileN.write（u）则没有。我不知道你的设置（也许你已经对网站的python进行了一些更改），但是以下预计会在这里打破：

# -*- coding: utf-8 -*-                                                                                                                                                                               
ff = open("ff.txt", "w")
ff.write(u"ą✈✈")
ff.close()

使用：

Traceback (most recent call last):
  File "ex.py", line 5, in <module>
    ff.write(u"ą✈✈")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

这意味着，在写入文件之前，应将unicode字符串更改为字符串。您的file6示例显示了如何执行此操作：

u"ą✈✈".encode("utf-8")

魔术字符串-*- coding: utf-8 -*-是一种能够以WYSIWYG方式编写unicode字符串文字的字符串：u"ą✈✈"，它无法帮助您在任何其他情况下确定编码。

因此，不要在Python2.6中为.write（）方法提供任何unicode字符串。好的做法是在代码中使用unicode字符串，但在输入/输出边界处转换为/具体编码。

编解码器示例很好，还有urllib。

Answer 3

你在做什么是正确的。有关详细信息，请参阅this Python unicode howto。

一般原则是：

当二进制数据进入您的应用程序时（例如open()，urllib.urlopen()），请使用decode()方法获取unicode字符串。
- 如果字节字符串对于提供的编码无效，则可能会获得UnicodeDecodeError。在这种情况下，请执行以下操作之一：
  1. 使用decode的第二个参数替换或忽略不良字符
  2. 更加努力地找出真正的编码是什么
  3. 修复输入，如果它真的被破坏了。
- 对于文件，您可以使用codecs.open包装器为您透明地执行此操作。
- 网络数据通常必须手动解码，但有时负载会声明自己的编码（例如，html，XML），有时它与标题不匹配！
- 对于数据库数据，通常数据库驱动程序将为您提供一些透明编码/解码方法，并始终为您提供unicode字符串。否则，您需要手动编码/解码。
在您的应用程序中使用unicode字符串。
在二进制数据离开您的应用程序之前，在字符串上使用encode()进行编码以获得所需的编码。
- 如果目标编码无法代表某些unicode字符，则可能会获得UnicodeEncodeError。在这种情况下，请执行以下操作之一：
  1. 使用encode()的第二个参数忽略或替换目标编码中无法表示的字符;
  2. 请勿在您的应用程序中生成这些字符。
  3. 找到另一种表示方式。例如，在XML中，您可以使用数字字符实体。
- 对于文件，您可以使用codecs.open包装器透明地为您进行编码。
- 对于数据库连接，驱动程序通常会选择接受unicode字符串并为您编码。
- 对于网络连接，通常必须手动编码。有时，有效负载将由一个可以为您正确编码的库生成（例如，编写XML）。

Answer 4

因为你正确地使用了神奇的“编码评论”，所以一切都按照假设运作。

为什么所有这些unicode命令在Python中都运行CORRECT？无论我做什么，他们都会正确地打印我的角色

4 个答案: