Question

在SO成员的帮助下，我能够达到以下目标，以下是示例代码，目的只是合并来自give文件夹的文本文件及其子文件夹，并将输出存储为master.txt。但我偶尔会得到回溯，看起来在阅读文件时它会抛出一个错误。

考虑建议，输入和一些研究，最好用统一的unicode清理文本文件或使用一些逐行函数，因此读取每一行应该是修剪垃圾字符和空行。

import shutil
import os.path

root = 'C:\\Dropbox\\test\\'
files = [(path,f) for path,_,file_list in os.walk(root) for f in file_list]

with open('C:\\Dropbox\\Python\\master.txt','wb') as output:
    for path, f_name in files:
        with open(os.path.join(path, f_name), 'rb') as input:
            shutil.copyfileobj(input, output)
        output.write(b'\n') # insert extra newline 

with open('master.txt', 'r') as f:
  lines = f.readlines()
with open('master.txt', 'w') as f:
  f.write("".join(L for L in lines if L.strip()))

追溯我得到：

Traceback (most recent call last):
  File "C:\Dropbox\Python\master1.py", line 14, in <module>
    lines = f.readlines()
  File "C:\PYTHON32\LIB\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8159: character maps to <undefined>

Answer 1

您已在文本模式下打开master.txt。然后，当您从中读取（）时，它将使用系统的默认编码对其进行解码。显然，当你得到UnicodeDecodeError时，文件处于另一个解码中。

以二进制模式打开文件，或指定正确的编码。

Python 3：处理二进制模式中的剥线

1 个答案: