Python XMLParser:何时调用data()方法

时间:2012-06-11 15:54:11

标签: python elementtree

我正在学习Python,并且对xml解析器(ElementTree - XMLParser)行为有一些了解。

我修改了documentation

中的示例
class MaxDepth:                     # The target object of the parser
    path = ""
    def start(self, tag, attrib):   # Called for each opening tag.
        self.path += "/"+ tag
        print '>>> Entering - ' + self.path
    def end(self, tag):             # Called for each closing tag.
        print '<<< Leaving - ' + self.path
        if self.path.endswith('/'+tag):
            self.path = self.path[:-(len(tag)+1)]
    def data(self, data):
        if data:
            print '... data called ...'
            print data , 'length -' , len(data)
    def close(self):    # Called when all data has been parsed.
        return self

打印以下输出

>>> Entering - /a
... data called ...

length - 1
... data called ...
   length - 2
>>> Entering - /a/b
... data called ...

length - 1
... data called ...
   length - 2
<<< Leaving - /a/b
... data called ...

length - 1
... data called ...
   length - 2
>>> Entering - /a/b
... data called ...

length - 1
... data called ...
     length - 4
>>> Entering - /a/b/c
... data called ...

length - 1
... data called ...
       length - 6
>>> Entering - /a/b/c/d
... data called ...

length - 1
... data called ...
       length - 6
<<< Leaving - /a/b/c/d
... data called ...

length - 1
... data called ...
     length - 4
<<< Leaving - /a/b/c
... data called ...

length - 1
... data called ...
   length - 2
<<< Leaving - /a/b
... data called ...

length - 1
<<< Leaving - /a
<__main__.MaxDepth instance at 0x10e7dd5a8>

我的问题是

  1. 何时调用data()方法。
  2. 为什么在开始标记之前调用两次
  3. 我找不到api文档来获取有关data方法的更多详细信息。我在哪里可以找到类似XMLParser类的api引用的javadoc。

1 个答案:

答案 0 :(得分:2)

如果您要修改数据方法,请执行以下操作:

def data(self, data):
    if data:
        print '... data called ...'
        print repr(data), 'length -' , len(data)

你会明白为什么有多次调用数据方法;它被标记之间的每一行文本数据调用:

>>> Entering - /a
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
<<< Leaving - /a/b
... data called ...
'\n' length - 1
... data called ...
'  ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
'    ' length - 4
# ... etc ...

XMLParser方法基于Expat解析器。

根据我的经验,任何流式XML解析器都会将文本数据视为一系列块,您必须将所有数据事件连接在一起,直到您遇到下一个starttag或endtag事件。通常,解析器会在空白边界处分解块,但这不是给定的。