Python 3.6 utf-8 UnicodeEncodeError

时间:2018-04-22 05:05:27

标签: python unicode utf-8

#!/usr/bin/env python3
import glob
import xml.etree.ElementTree as ET
filenames = glob.glob("C:\\Users\\####\\Desktop\\BNC2\\[A00-ZZZ]*.xml")
out_lines = []
for filename in filenames:
    with open(filename, 'r', encoding="utf-8") as content:
        tree = ET.parse(content)
        root = tree.getroot()
        for w in root.iter('w'):
            lemma = w.get('hw')
            pos = w.get('pos')
            tag = w.get('c5')

            out_lines.append(w.text + "," + lemma + "," + pos + "," + tag)

with open("C:\\Users\\####\\Desktop\\bnc.txt", "w") as out_file:
    for line in out_lines:
        line = bytes(line, 'utf-8').decode('utf-8', 'ignore')
        out_file.write("{}\n".format(line))

给出错误:

UnicodeEncodeError:'charmap'编解码器无法对位置0中的字符'\ u2192'进行编码:字符映射到未定义

我认为这条线会解决这个问题......

line = bytes(line,'utf-8')。decode('utf-8','ignore')

2 个答案:

答案 0 :(得分:2)

打开输出文件时需要指定编码,与输入文件相同:

with open("C:\\Users\\####\\Desktop\\bnc.txt", "w", encoding="utf-8") as out_file:
    for line in out_lines:
        out_file.write("{}\n".format(line))

答案 1 :(得分:-2)

如果您的脚本有多个读写操作,并且您希望对所有这些编码使用特定的编码(例如utf-8),我们也可以更改默认编码

import sys
reload(sys)
sys.setdefaultencoding('UTF8')

我们应该只在我们有多个读/写时使用它,并且应该在脚本的开头完成

Changing default encoding of Python?