我对使用3.x的python还是不熟悉,并且遇到了我正在测试/学习的XML文件的问题。当我查看原始文件(它是ASCII编码的btw)时,问题(我很确定)是其中存在U + 00A0代码。
XML如下:
<?xml version="1.0" encoding="utf-8"?>
<XMLSetData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.clientsite.com/subdir/r2.4/v1">
<FileCreationDate>2018-05-05T11:35:44.1043858-05:00</FileCreationDate>
<XMLSetDataList>
<DataIDNumber>99345346</DataIDNumber>
<DataName>RSRS TVL5697 ULL Georgetown</DataName>
</XMLSetDataList>
</XMLSetData>
使用Notepad ++,它向我显示文本在ULL和Georgetown之间使用“ xA0”而不是“”(两个空格)。因此,当我执行以下代码时:
import xml.etree.ElementTree as ET
events = ("end", "start-ns", "end-ns")
for event, elem in ET.iterparse(xml_file, events=events):
if event == "end":
eltag = elem.tag
eltext = elem.text
print( eltag, eltext)
它给了我一个错误说明:
File "C:\Users\d\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1222, in iterator
yield from pullparser.read_events()
File "C:\Users\d\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1297, in read_events
raise event
File "C:\Users\d\AppData\Local\Programs\Python\Python37-32\lib\xml\etree\ElementTree.py", line 1269, in feed
self._parser.feed(data)
File "<string>", line None
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 6, column 30
我该如何解决/解决它?如果我删除xA0部分,它可以很好地解析,但是显然类似的事情可能会再次出现,并且我想以编程方式处理它。