我正在用Python解析1.3 GB的xml文件。以下是我的代码:
import xml.etree.ElementTree as etree
with open('SemCor+OMSTI/semcor+omsti.data.xml') as f:
xml = f.read()
for event, elem in etree.iterparse(xml, events=('start', 'end', 'start-ns', 'end-ns')):
print(event, elem)
但是它给出的输出如下:
Traceback (most recent call last):
File "parse.py", line 8, in <module>
for event, elem in etree.iterparse(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>", events=('start', 'end', 'start-ns', 'end-ns')):
File "/home/himanshu/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1242, in iterparse
source = open(source, "rb")
之后是字符串格式的文件内容(不可解析)。我指的是this tutorial for parsing a very large xml file
文件输出太大,我看不到确切的错误。但是当我在打印数据并显示错误之前执行ctrl + C时,则会显示OSError。