在Python中串联文本文件时出现UnicodeEncodeError

时间:2019-01-14 10:17:39

标签: python-3.x file-io text-files python-unicode corpus

我是python初学者。 我正在尝试将所有8个文本文件中的文本添加(连接)到一个文本文件中以形成主体。 但是,我得到了错误 UnicodeDecodeError:'charmap'编解码器无法解码位置7311处的字节0x9d:字符映射到

 filenames = glob2.glob('Final_Corpus_SOAs/*.txt')  # list of all .txt files in the directory
 print(filenames)

输出: ['Final_Corpus_SOAs \\ 1.txt','Final_Corpus_SOAs \\ 2.txt','Final_Corpus_SOAs \\ 2018 SOA Muir.txt','Final_Corpus_SOAs \\ 3.txt','Final_Corpus_SOAs \\ 4.txt',_ Asinal_Corpus \\ 5.txt','Final_Corpus_SOA \\ 6.txt','Final_Corpus_SOA \\ 7.txt','Final_Corpus_SOAs \\ 8.txt']

with open('output.txt', 'w',encoding="utf-8") as outfile:
for fname in filenames:
    with open(fname) as infile:
        for line in infile:
            outfile.write(line)

输出: UnicodeDecodeError:“ charmap”编解码器无法解码位置7311处的字节0x9d:字符映射为未定义

感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

打开文件时,应指定编码类型。有关更多信息,请参见link。因为这里已经回答了。

<td>添加到您的代码中,如下所示

encoding="utf8"

答案 1 :(得分:0)

如果您确定编码,则应在打开文件以进行读写时声明它:

encoding = 'utf8'    # or 'latin1' or 'cp1252' or...

with open('output.txt', 'w',encoding=encoding) as outfile:
for fname in filenames:
    with open(fname, encoding=encoding) as infile:
        for line in infile:
            outfile.write(line)

如果不确定或不想被编码打扰,可以通过以二进制形式读取和写入文件来以字节级别复制文件:

with open('output.txt', 'wb') as outfile:
for fname in filenames:
    with open(fname, 'rb') as infile:
        for line in infile:
            outfile.write(line)