Question

我在从文件读取，处理其字符串并保存到UTF-8文件时遇到问题。

以下是代码：

try:
    filehandle = open(filename,"r")
except:
    print("Could not open file " + filename)
    quit() 

text = filehandle.read()
filehandle.close()

然后我对变量文本进行了一些处理。

然后

try:
    writer = open(output,"w")
except:
    print("Could not open file " + output)
    quit() 

#data = text.decode("iso 8859-15")    
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()

这完全输出文件但是根据我的编辑器在iso 8859-15中这样做了。由于同一个编辑器将输入文件（在变量文件名中）识别为UTF-8，我不知道为什么会发生这种情况。就我的研究表明，评论的行应该可以解决问题。但是，当我使用这些行时，生成的文件主要是特殊字符的乱码，带有波浪号的文字是西班牙语。我真的很感激任何帮助，因为我很难过......

Answer 1

使用codecs模块在程序的I / O边界处理Unicode与Unicode之间的文本：

import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

编辑：现在建议使用io模块而不是编解码器，并且与Python 3的open语法兼容：

import io
with io.open(filename, 'r', encoding='utf8') as f:
    text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
    f.write(text)

Answer 2

您也可以通过以下代码了解它：

file=open(completefilepath,'r',encoding='utf8',errors="ignore")
file.read()

Answer 3

你不能用open打开。使用编解码器。

当您使用open内置函数在python中打开文件时，您将始终以ascii读/写文件。要在utf-8中编写它，请尝试：

import codecs
file = codecs.open('data.txt','w','utf-8')

Python从文件中读取并保存到utf-8

3 个答案: