Question

我正在使用以下代码段来读取该文件。它在windows和linux服务器中返回两个不同的输出。我正在使用python 3。

with open('test.txt','rb') as f:
    data = f.read().decode('utf-8')
    print(type(data.splitlines()[34560]))
    print(data.splitlines()[34560])

Windows中的结果：

<class 'str'>
testpair14/user_photos/images/282/original/Capture d’écran 2012-09-07 à 2.50.31 PM20120917-37935-13g7sn1-0_1347875141.png

Linux中的结果：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 53-54: ordinal not in range(128)

这可能是什么原因？请建议。

Answer 1

要开始使用，请阅读https://docs.python.org/3/howto/unicode.html

要阅读文本文件，只需将其作为文本文件打开，并根据需要指定编码：

open('test.txt','r', encoding="utf-8")

对该文件的读取操作将返回Unicode字符串而不是字节字符串。通常，每当处理文本时，始终使用Unicode对象。

将Unicode打印到控制台是另一种蠕虫，特别是在Windows上支持不足。但是StackOverflow上已经有很多关于这个问题的答案，例如。在这里：Python, Unicode, and the Windows console和Understanding Python Unicode and Linux terminal

在Linux中用Python打印字符串时编码问题

1 个答案: