我试图运行此代码:
import glob
import io
read_files = filter(lambda f: f!='final.txt' and f!='result.txt', glob.glob('*.txt'))
with io.open("REGEXES.rx.txt", "w", encoding='UTF-32') as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
outfile.write('|')
要合并一些文本文件,我收到此错误:
Traceback (most recent call last):
File "/Users/kosay.jabre/Desktop/Password Assessor/RegexesNEW/CombineFilesCopy.py", line 10, in <module>
outfile.write(infile.read())
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 2189: ordinal not in range(128)
我尝试过UTF-8,UTF-16,UTF-32和latin-1编码。有什么想法吗?
答案 0 :(得分:2)
您从infile.read()
收到错误。文件在文本模式下打开,未指定编码。 Python将尝试猜测您的默认文件编码,但可能默认为ascii。任何大于\x7f
/ 127的字节都不是ASCI,因此会抛出错误。
在继续操作之前,您需要知道文件的编码,否则如果Python尝试读取一个编码并获得另一个编码,您将会收到错误,或者您只是获得mojibake。
假设 infile
将是utf-8编码,请更改:
with open(f, "r") as infile:
为:
with open(f, "r", encoding="utf-8") as infile:
您可能还希望将outfile
的编码更改为UTF-8,以避免潜在的存储浪费。因为输入被解码为纯Unicode,所以infile和outfile的编码不需要匹配。