我从一个大的xml文件中获取信息,我正在使用python和lxml target parsing interface来完成它。我希望有可能设置一个限制,然后解析停止。这是一些代码:
解析器目标代码:
class TitleTarget(object):
def __init__(self, limit=None):
self.limit = limit
self.counter = 0
def start(self, tag, attrib):
if self.limit and self.counter > self.limit:
#### BREAK HERE ####
return False
#doProcessing(attrib)
self.counter = self.counter + 1
def end(self, tag):
pass
def data(self, data):
pass
def close(self):
pass
启动解析的代码:
parser = etree.XMLParser(target = TitleTarget(limit))
etree.parse(file, parser)
我知道处理进入“BREAK HERE”-line,但我还没有找到任何方法来停止解析。我试过回复True, False, [], and raising Error
,似乎没有工作。它始终处理,直到文件结束。
有没有办法停止使用此方法处理bu。
答案 0 :(得分:1)
您可以循环etree.parse(file, parser)
中的行并在每行上调用file
,而不是使用parser.feed
。这使您可以控制何时突破循环。
现在,您只需在目标中设置self.done=True
,然后在Feed循环中测试target.done
:
import lxml.etree as ET
class HaltingTarget(object):
def __init__(self, limit=None):
self.done=False
self.limit=limit
self.counter=0
self.result=[]
def start(self, tag, attrib):
if self.limit and self.counter>self.limit:
self.done=True
return
if attrib:
self.result.append(attrib)
self.counter+=1
def end(self, tag):
pass
def data(self, data):
pass
def comment(self, text):
pass
def close(self):
return
def halt_parser():
content='''\
<node1>
<Title a1="x1"> My Title </Title>
<node2 a1="x2"> ... </node2>
<node2 a1="x1"> ... </node2>
</node1>
'''
target=HaltingTarget()
parser=ET.XMLParser(target=target)
for line in content.splitlines():
parser.feed(line.strip())
if target.done: break
# We can't call parser.close() since the XML we've fed it is probably
# incomplete. We don't plan to use `parser` anymore, so delete it.
del parser
print(target.result)
# [{'a1': u'x1'}, {'a1': u'x2'}, {'a1': u'x1'}]