使用带有sax

时间:2015-05-30 15:15:42

标签: python xml sax

我正在尝试使用SAX使用Python解析XML文件。

该文档具有多个具有相同名称的元素。我想打印元素的一些属性,但程序只打印文档中遇到的最后一个元素的属性。

这是代码:

# art.py
import sys

from xml.sax import make_parser 
from handlers import ArticleHandler

ch = ArticleHandler( ) 
saxparser = make_parser( )

saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)

print "TYPE:", ch.TYPE
print "SUBTYPE:" , ch.SUBTYPE


# handlers.py
from xml.sax.handler import ContentHandler

class ArticleHandler(ContentHandler):

 TYPE = ""
 SUBTYPE = ""

 def startElement(self, name, attrs):
     if name == "relation":
         self.TYPE = attrs.get("TYPE", "") 
         self.SUBTYPE = attrs.get("SUBTYPE")

这是XML:

    <relation ID="CNN_CF_20030303.1900.00-R3" TYPE="ORG-AFF" SUBTYPE="Employment">
    ...
    </relation>
    <relation ID="CNN_CF_20030303.1900.00-R4" TYPE="ORG-AFF" SUBTYPE="Membership">
    ...
    </relation>

对于此输入,输出为

    TYPE:ORG-AFF
    SUBTYPE:Membership

而预期的输出是

    TYPE:ORG-AFF
    SUBTYPE:Employment
    TYPE:ORG-AFF
    SUBTYPE:Membership

如何修复此错误?

1 个答案:

答案 0 :(得分:0)

您必须重写程序以处理多个关系标记,例如使用列表

import sys
from xml.sax import make_parser 
from xml.sax.handler import ContentHandler

class ArticleHandler(ContentHandler):
    def __init__(self):
        self.relations = []

    def startElement(self, name, attrs):
        if name == "relation":
            self.relations.append((attrs.get("TYPE", ""), attrs.get("SUBTYPE"))

ch = ArticleHandler() 
saxparser = make_parser()
saxparser.setContentHandler(ch)
saxparser.parse(sys.stdin)

for type, subtype in ch.relations:
    print "TYPE:", type
    print "SUBTYPE:" , subtype