Question

我有一个Python 3程序，它从Windows-1252编码文件中读取一些字符串：

with open(file, 'r', encoding="cp1252") as file_with_strings:
    # save some strings

我后来想写给stdout。我试过这样做：

print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)

print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'

sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface

print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte

print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string

我在这里挠头。我想在cp1252中打印我从文件中看到的字符串。（在我的终端中，当我more $file时，这些字符显示为问号，因此我的终端可能是ascii。）

会喜欢一些澄清！谢谢！

Answer 1

对于那些有同样问题的人，我最终做了：

to_print = (some_string + "\n").encode("cp1252")
sys.stdout.buffer.write(to_print)
sys.stdout.flush() # I write a ton of these strings, and segfaulted without flushing

Answer 2

使用cp1252进行编码时，必须使用相同的解码。

例如：

import sys
txt = ("hi hello\n").encode("cp1252")
#print((txt).decode("cp1252"))
sys.stdout.buffer.write(txt)
sys.stdout.flush()

这将打印＆＃34;嗨你好\ n＆＃34; （解码后用cp1252编码）。

Answer 3

您要么管道到您的脚本，要么您的语言环境已损坏。您应该修复您的环境，而不是将脚本修复到您的环境，因为这会使您的脚本非常脆弱。

如果您正在管道，Python假定输出应为“ASCII”并将stdout的编码设置为“ASCII”。

在正常情况下，Python使用locale来计算要应用于stdout的编码。如果您的语言环境已损坏（未安装或损坏），Python将默认为“ASCII”。 “C”的语言环境也会为您提供“ASCII”的编码。

键入locale检查您的语言环境，确保没有返回任何错误。 E.g。

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_CTYPE="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_ALL=

如果所有其他方法都失败或您正在管道，则可以通过设置PYTHONIOENCODING环境变量来覆盖Python的区域设置检测。 E.g。

$ PYTHONIOENCODING=utf-8 ./my_python.sh

请记住，您的shell有一个区域设置，而且您的终端有一个编码 - 它们都需要正确设置

Answer 4

从Python 3.7开始，您可以使用sys.stdout方法更改写入reconfigure的所有文本的编码：

import sys

sys.stdout.reconfigure(encoding="cp1252")

如果您需要更改程序所有输出的编码，那将很有帮助。

使用Python 3中的编码打印到stdout

4 个答案: