我使用lxml和python 3.5来解析xml文件。
到目前为止我的代码是:
for event, element in etree.iterparse(source, tag="article"):
for child in element:
print (child.tag, child.text)
element.clear()
执行时我会在一段时间后收到以下消息:
lxml.etree.XMLSyntaxError: Entity 'ouml' not defined, line 47, column 25
我有一个DTD文件,其中定义了所有实体。如何包含文件或定义缺少的实体?
答案 0 :(得分:0)
这是我的解决方案: 我正在阅读DTD文件以进行验证,因为CoderBC建议:
from lxml import etree
from lxml.etree import XMLSyntaxError
import sys
import os
source = sys.argv[1]
dtd = etree.DTD(file=sys.argv[2])#read DTD
count = 0
#iterate through nodes
for event, element in etree.iterparse(source, load_dtd=True):
count += 1
#print all children
for child in element:
print(child.tag, child.text)
element.clear()
print("Final Count :", count)