我正在学习Python,并且对xml解析器(ElementTree - XMLParser)行为有一些了解。
我修改了documentation
中的示例class MaxDepth: # The target object of the parser
path = ""
def start(self, tag, attrib): # Called for each opening tag.
self.path += "/"+ tag
print '>>> Entering - ' + self.path
def end(self, tag): # Called for each closing tag.
print '<<< Leaving - ' + self.path
if self.path.endswith('/'+tag):
self.path = self.path[:-(len(tag)+1)]
def data(self, data):
if data:
print '... data called ...'
print data , 'length -' , len(data)
def close(self): # Called when all data has been parsed.
return self
打印以下输出
>>> Entering - /a
... data called ...
length - 1
... data called ...
length - 2
>>> Entering - /a/b
... data called ...
length - 1
... data called ...
length - 2
<<< Leaving - /a/b
... data called ...
length - 1
... data called ...
length - 2
>>> Entering - /a/b
... data called ...
length - 1
... data called ...
length - 4
>>> Entering - /a/b/c
... data called ...
length - 1
... data called ...
length - 6
>>> Entering - /a/b/c/d
... data called ...
length - 1
... data called ...
length - 6
<<< Leaving - /a/b/c/d
... data called ...
length - 1
... data called ...
length - 4
<<< Leaving - /a/b/c
... data called ...
length - 1
... data called ...
length - 2
<<< Leaving - /a/b
... data called ...
length - 1
<<< Leaving - /a
<__main__.MaxDepth instance at 0x10e7dd5a8>
我的问题是
data
方法的更多详细信息。我在哪里可以找到类似XMLParser
类的api引用的javadoc。答案 0 :(得分:2)
如果您要修改数据方法,请执行以下操作:
def data(self, data):
if data:
print '... data called ...'
print repr(data), 'length -' , len(data)
你会明白为什么有多次调用数据方法;它被标记之间的每一行文本数据调用:
>>> Entering - /a
... data called ...
'\n' length - 1
... data called ...
' ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
' ' length - 2
<<< Leaving - /a/b
... data called ...
'\n' length - 1
... data called ...
' ' length - 2
>>> Entering - /a/b
... data called ...
'\n' length - 1
... data called ...
' ' length - 4
# ... etc ...
XMLParser方法基于Expat解析器。
根据我的经验,任何流式XML解析器都会将文本数据视为一系列块,您必须将所有数据事件连接在一起,直到您遇到下一个starttag或endtag事件。通常,解析器会在空白边界处分解块,但这不是给定的。