Question

我已经读过这个：

Setting the correct encoding when piping stdout in Python

我试图坚持经验法则：＆＃34; 始终在内部使用Unicode。解码您收到的内容并对您发送的内容进行编码。＆＃34;

所以这是我的主要档案：

# coding: utf-8

import os
import sys

from myplugin import MyPlugin
if __name__ == '__main__':
    c = MyPlugin()
    a = unicode(open('myfile.txt').read().decode('utf8'))
    print(c.generate(a).encode('utf8'))

让我感到紧张的是：

我读了一个utf8文件，所以我解码了。
然后我强制将其转换为unicode，这将给出unicode(open('myfile.txt').read().decode('utf8'))
然后我尝试将其输出到终端
在我的Linux shell上我需要将它重新编码为utf8，我想这是正常的因为我一直在unicode字符串上工作，然后输出它，我必须在utf8中对其进行重新编码（如果我错了，请纠正我）
当我在Windows下使用Pycharm运行它时，它是两次utf8编码，这给了我agrÃ©able, dÃ©jÃ之类的东西。因此，如果我删除encode('utf8')（将最后一行更改为print(c.generate(a))，那么它适用于Pycharm，但不再适用于Linux，我得到：'ascii' codec can't encode character u'\xe9' in position blabla你知道吗问题。

如果我在命令行中尝试：

Linux / shell ssh：import sys sys.stdout.encoding我得到'UTF-8'
我的代码中的Linux / shell ：import sys sys.stdout.encoding我得到None WTF ??
Windows / Pycharm：import sys sys.stdout.encoding我得到'windows-1252'

对此进行编码的最佳方法是什么，以便它适用于两种环境？

Answer 1

unicode(open('myfile.txt').read().decode('utf8'))

无需用unicode打包，因为str.decode的结果已经unicode。

print(c.generate(a).encode('utf8'))

不需要encode，因为Python会根据终端编码对字符串本身进行编码。

所以这是正确的做法

print(c.generate(a))

您获得'ascii' codec can't encode character u'\xe9' in position，因为您的Linux终端具有ascii编码，因此Python无法为其打印unicode字符。

请参阅https://wiki.python.org/moin/PrintFails

我建议修复您的终端（环境），而不是代码。您不应该依赖终端编码，尤其是通常将此信息打印到文件中。

如果您仍想将其打印到任何支持ASCII的终端，您可以使用str.encode('unicode-escape')：

>>> print(u'щхжы'.encode('unicode-escape'))
\u0449\u0445\u0436\u044b

但它不会被人类阅读，所以我不明白这一点。

Answer 2

你的哲学是正确的，但你过度复杂化并使你的代码变得脆弱。

以文本模式打开文件，自动为您转换为Unicode。然后打印没有编码 - 打印应该计算出正确的编码。

如果您的Linux环境未正确设置，请在Linux环境中设置PYTHONIOENCODING=utf-8变量（export PYTHONIOENCODING=utf-8）以解决打印过程中的任何问题。您应该考虑将您的语言环境设置为UTF-8变体，例如en_GB.UTF-8，以避免必须定义PYTHONIOENCODING。

PyCharm无需修改即可使用。

您的代码应如下所示：

import os
import sys
import io

from myplugin import MyPlugin

if __name__ == '__main__':
    c = MyPlugin()
    # t is the default
    with io.open('myfile.txt', 'rt', encoding='utf-8') as myfile:
        # a is now a Unicode string
        a = myfile.read()

    result = c.generate(a)
    print result

如果您使用的是Python 3.x，请从import io中删除io.和io.open()。

Python 2.7 unicode再次混淆

2 个答案: