Question

我刚刚将Anaconda安装到Windows 10机器上（Python 2.7.12 | Anaconda 4.2.0（64位）|）我在从文件中读取文本时遇到问题。请参阅下面的代码和输出。我想要文件中的实际文本。

谢谢！

输出：

 ['\xff\xfeT\x00h\x00i\x00s\x00',
  '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00.\x00',
   '\x00',
   '\x00',
   '\x00',
   '\x00T\x00h\x00i\x00s\x00',
   '\x00i\x00s\x00',
   '\x00a\x00',
   '\x00t\x00e\x00s\x00t\x00']

代码：

try:    
    with open('test.txt', 'r') as f:        
        text = f.read()
except Exception as e:
    print e
    print text.split()

的test.txt：

This is a test.

This is a test

Answer 1

我使用io模块以明确的编码打开文件时运气最好。

import io
with io.open(FILE, 'r', encoding='utf-16') as f:
    job = f.read()

Answer 2

您遇到文本编码问题。您的文件不是以UTF-8编码，而是以UTF-16编码。而不是使用open，请使用：

import codecs
with codecs.open("test.txt", "r", encoding="utf-16") as f:
    text = f.read()

或者切换到对unicode有更好支持的Python3。

读取文件到字符串（python）

2 个答案: