我正在试图弄清楚如何用他们的拉丁对应物(å
替换所有重音字符(é
,í
,a
...),我分别e
,i
并尝试了几种方法,但他们都做了一些超出我理解范围的事情,这使得ElementTree以后无法使用.fromstring()
进行转换。
我还必须逃避&符号,但我已经想通了。
相关语法:
# -- coding: utf-8 --
import xml.etree.ElementTree as ET
import os
import re
path = "C:\\Users\\SuperUser\\Desktop\\audit\\audit\\saved\\audit"
root = ET.Element("root")
for filename in os.listdir(path):
with open(path + "\\" + filename) as myfile:
lines = myfile.readlines()
for line in lines:
line = re.sub(r"&(?!#\d{3};|amp;)", "&", line)
xmlVal = ET.fromstring(line)
错误发生在最后一行,与其他解决方案一起抱有UnicodeEncodeError: 'ascii' codec can't encode character u'\xc4' in position 161: ordinal not in range(128)
或类似错误。
答案 0 :(得分:1)
尝试使用 unidecode 模块
<强>实施例强>
import xml.etree.ElementTree as ET
import os
import re
import unidecode
path = "C:\\Users\\SuperUser\\Desktop\\audit\\audit\\saved\\audit"
root = ET.Element("root")
for filename in os.listdir(path):
with open(path + "\\" + filename) as myfile:
lines = myfile.readlines()
for line in lines:
line = unidecode.unidecode(line)
xmlVal = ET.fromstring(line)