我正在尝试使用SAX使用Python解析XML文件。
该文档具有多个具有相同名称的元素。我想打印元素的一些属性,但程序只打印文档中遇到的最后一个元素的属性。
这是代码:
# art.py
import sys
from xml.sax import make_parser
from handlers import ArticleHandler
ch = ArticleHandler( )
saxparser = make_parser( )
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
print "TYPE:", ch.TYPE
print "SUBTYPE:" , ch.SUBTYPE
# handlers.py
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
TYPE = ""
SUBTYPE = ""
def startElement(self, name, attrs):
if name == "relation":
self.TYPE = attrs.get("TYPE", "")
self.SUBTYPE = attrs.get("SUBTYPE")
这是XML:
<relation ID="CNN_CF_20030303.1900.00-R3" TYPE="ORG-AFF" SUBTYPE="Employment">
...
</relation>
<relation ID="CNN_CF_20030303.1900.00-R4" TYPE="ORG-AFF" SUBTYPE="Membership">
...
</relation>
对于此输入,输出为
TYPE:ORG-AFF
SUBTYPE:Membership
而预期的输出是
TYPE:ORG-AFF
SUBTYPE:Employment
TYPE:ORG-AFF
SUBTYPE:Membership
如何修复此错误?
答案 0 :(得分:0)
您必须重写程序以处理多个关系标记,例如使用列表
import sys
from xml.sax import make_parser
from xml.sax.handler import ContentHandler
class ArticleHandler(ContentHandler):
def __init__(self):
self.relations = []
def startElement(self, name, attrs):
if name == "relation":
self.relations.append((attrs.get("TYPE", ""), attrs.get("SUBTYPE"))
ch = ArticleHandler()
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)
for type, subtype in ch.relations:
print "TYPE:", type
print "SUBTYPE:" , subtype