lxml xmlsyntaxerror:entity' ouml'没有定义的

时间:2016-12-22 16:45:07

标签: python-3.x xml-parsing lxml

我使用lxml和python 3.5来解析xml文件。

到目前为止我的代码是:

for event, element in etree.iterparse(source, tag="article"):
    for child in element:
        print (child.tag, child.text)
    element.clear()

执行时我会在一段时间后收到以下消息:

 lxml.etree.XMLSyntaxError: Entity 'ouml' not defined, line 47, column 25

我有一个DTD文件,其中定义了所有实体。如何包含文件或定义缺少的实体?

1 个答案:

答案 0 :(得分:0)

这是我的解决方案: 我正在阅读DTD文件以进行验证,因为CoderBC建议:

from lxml import etree
from lxml.etree import XMLSyntaxError
import sys
import os

source = sys.argv[1]
dtd = etree.DTD(file=sys.argv[2])#read DTD
count = 0
#iterate through nodes
for event, element in etree.iterparse(source, load_dtd=True):
   count += 1
   #print all children
   for child in element:
      print(child.tag, child.text)
element.clear()

print("Final Count :", count)