Python xml.etree.ElementTree解析正斜杠

时间:2014-11-20 19:02:30

标签: python xml xml-parsing elementtree

我试图使用xml.etree.ElementTree模块在python中解析由Stanford CoreNLP返回的XML,但我似乎一直在遇到此错误。

这是我得到的错误:

File "my_script.py", line 5
    root = ET.fromstring(content)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML
    parser.feed(text)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4473, column 19

我查看了XML文件中第4473行的内容:

<word>5 1/2</word>

第19栏从5开始。

我认为问题是由数字5 1/2中的正斜杠引起的,因为这是5 1/2在XML文件中出现的第一个实例。有没有人知道我仍然可以使用正斜杠解析XML文件?

以下是代码:

import xml.etree.ElementTree as ET
f = open("samplefiles/samplefile999.txt.xml","r");
content = f.read()
f.close();
root = ET.fromstring(content)

0 个答案:

没有答案