我试图用python lxml
库解析一个大的xml文件(~500 MB)iterparse
,使用:
context = etree.iterparse('large-file.xml')
for event, element in context:
# do some stuff
element.clear()
但它返回以下错误:
Traceback (most recent call last):
File "test.py", line 176, in <module> test_parser()
File "test.py", line 121, in test_parser
for event, element in context:
File "src/lxml/iterparse.pxi", line 208, in lxml.etree.iterparse.__next__ (src/lxml/etree.c:155963)
File "src/lxml/iterparse.pxi", line 193, in lxml.etree.iterparse.__next__ (src/lxml/etree.c:155671)
File "src/lxml/iterparse.pxi", line 228, in lxml.etree.iterparse._read_more_events (src/lxml/etree.c:156298)
File "src/lxml/parser.pxi", line 1362, in lxml.etree._FeedParser.feed (src/lxml/etree.c:116552)
File "src/lxml/parser.pxi", line 589, in lxml.etree._ParserContext._handleParseResult (src/lxml/etree.c:107619)
File "src/lxml/parser.pxi", line 598, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/etree.c:107738)
File "src/lxml/parser.pxi", line 709, in lxml.etree._handleParseResult (src/lxml/etree.c:109447)
File "src/lxml/parser.pxi", line 638, in lxml.etree._raiseParseError (src/lxml/etree.c:108301)
File "large-file.xml", line 20593
lxml.etree.XMLSyntaxError: internal error: Huge input lookup, line 20593, column 199