这类似于question I've asked before。
我为包含多个日志的文本文件编写了一个pyparsing语法logparser
。日志记录每个函数调用和每个函数完成。基础进程是多线程的,因此有可能先调用慢速函数A
,然后又调用快速函数B
并几乎立即完成,然后在该函数A
完成并给出我们的返回值。因此,手工读取日志文件非常困难,因为一个功能的调用信息和返回值信息可能相隔数千行。
我的解析器能够解析函数调用(从现在开始称为input_blocks
)及其返回值(从现在开始称为output_blocks
)。我的解析结果(logparser.searchString(logfile)
)如下:
[0]: # first log
- input_blocks:
[0]:
- func_name: 'Foo'
- parameters: ...
- thread: '123'
- timestamp_in: '12:01'
[1]:
- func_name: 'Bar'
- parameters: ...
- thread: '456'
- timestamp_in: '12:02'
- output_blocks:
[0]:
- func_name: 'Bar'
- func_time: '1'
- parameters: ...
- thread: '456'
- timestamp_out: '12:03'
[1]:
- func_name: 'Foo'
- func_time: '3'
- parameters: ...
- thread: '123'
- timestamp_out: '12:04'
[1]: # second log
- input_blocks:
...
- output_blocks:
...
... # n-th log
我想解决一个函数调用的输入和输出信息分离的问题。因此,我想将input_block
和对应的output_block
放入function_block
中。我的最终解析结果应如下所示:
[0]: # first log
- function_blocks:
[0]:
- input_block:
- func_name: 'Foo'
- parameters: ...
- thread: '123'
- timestamp_in: '12:01'
- output_block:
- func_name: 'Foo'
- func_time: '3'
- parameters: ...
- thread: '123'
- timestamp_out: '12:04'
[1]:
- input_block:
- func_name: 'Bar'
- parameters: ...
- thread: '456'
- timestamp_in: '12:02'
- output_block:
- func_name: 'Bar'
- func_time: '1'
- parameters: ...
- thread: '456'
- timestamp_out: '12:03'
[1]: # second log
- function_blocks:
[0]: ...
[1]: ...
... # n-th log
为此,我定义了一个函数rearrange
,该函数遍历input_blocks
和output_blocks
并检查func_name
,thread
和时间戳是否匹配。但是,将匹配的块移动到一个function_block
中是我缺少的部分。然后,我将此函数设置为日志语法的解析操作:logparser.setParseAction(rearrange)
def rearrange(log_token):
for input_block in log_token.input_blocks:
for output_block in log_token.output_blocks:
if (output_block.func_name == input_block.func_name
and output_block.thread == input_block.thread
and check_timestamp(output_block.timestamp_out,
output_block.func_time,
input_block.timestamp_in):
# output_block and input_block match -> put them in a function_block
# modify log_token
return log_token
我的问题是:如何将匹配的output_block
和input_block
放在function_block
中,使我仍然喜欢以下便捷的访问方法: pyparsing.ParseResults
?
我的想法如下:
def rearrange(log_token):
# define a new ParseResults object in which I store matching input & output blocks
function_blocks = pp.ParseResults(name='function_blocks')
# find matching blocks
for input_block in log_token.input_blocks:
for output_block in log_token.output_blocks:
if (output_block.func_name == input_block.func_name
and output_block.thread == input_block.thread
and check_timestamp(output_block.timestamp_out,
output_block.func_time,
input_block.timestamp_in):
# output_block and input_block match -> put them in a function_block
function_blocks.append(input_block.pop() + output_block.pop()) # this addition causes a maximum recursion error?
log_token.append(function_blocks)
return log_token
但这不起作用。加法会导致最大的递归错误,并且.pop()
不能按预期工作。它不会弹出整个块,而只会弹出该块中的最后一个条目。此外,它实际上也不会删除该条目,只是将其从列表中删除,但仍可以通过其结果名称访问它。
某些input_blocks
可能没有相应的output_block
(例如,如果进程在所有功能完成之前崩溃了),这也是有可能的。因此,我的解析结果应具有属性input_blocks
,output_blocks
(用于备用块)和function_blocks
(用于匹配块)。
感谢您的帮助!
编辑:
我做了一个简单的例子来说明我的问题。另外,我进行了实验,并找到了一种解决方案,但是有点混乱。我必须承认其中包含很多尝试和错误,因为我既没有找到关于ParseResults
的文档,也无法理解ParseResults
的内部工作原理以及如何正确创建自己的嵌套from pyparsing import *
def main():
log_data = '''\
Func1_in
Func2_in
Func2_out
Func1_out
Func3_in'''
ParserElement.inlineLiteralsUsing(Suppress)
input_block = Group(Word(alphanums)('func_name') + '_in').setResultsName('input_blocks', listAllMatches=True)
output_block = Group(Word(alphanums)('func_name') +'_out').setResultsName('output_blocks', listAllMatches=True)
log = OneOrMore(input_block | output_block)
parse_results = log.parseString(log_data)
print('***** before rearranging *****')
print(parse_results.dump())
parse_results = rearrange(parse_results)
print('***** after rearranging *****')
print(parse_results.dump())
def rearrange(log_token):
function_blocks = list()
for input_block in log_token.input_blocks:
for output_block in log_token.output_blocks:
if input_block.func_name == output_block.func_name:
# found two matching blocks! now put them in a function_block
# and delete them from their original positions in log_token
# I have to do both __setitem__ and .append so it shows up in the dict and in the list
# and .copy() is necessary because I delete the original objects later
tmp_function_block = ParseResults()
tmp_function_block.__setitem__('input', input_block.copy())
tmp_function_block.append(input_block.copy())
tmp_function_block.__setitem__('output', output_block.copy())
tmp_function_block.append(output_block.copy())
function_block = ParseResults(name='function_blocks', toklist=tmp_function_block, asList=True,
modal=False) # I have no idea what modal and asList do, this was trial-and-error until I got acceptable output
del function_block['input'], function_block['output'] # remove duplicate data
function_blocks.append(function_block)
# delete from original position in log_token
input_block.clear()
output_block.clear()
log_token.__setitem__('function_blocks', sum(function_blocks))
return log_token
if __name__ == '__main__':
main()
结构。
***** before rearranging *****
[['Func1'], ['Func2'], ['Func2'], ['Func1'], ['Func3']]
- input_blocks: [['Func1'], ['Func2'], ['Func3']]
[0]:
['Func1']
- func_name: 'Func1'
[1]:
['Func2']
- func_name: 'Func2'
[2]:
['Func3']
- func_name: 'Func3'
- output_blocks: [['Func2'], ['Func1']]
[0]:
['Func2']
- func_name: 'Func2'
[1]:
['Func1']
- func_name: 'Func1'
***** after rearranging *****
[[], [], [], [], ['Func3']]
- function_blocks: [['Func1'], ['Func1'], ['Func2'], ['Func2'], [], []] # why is this duplicated? I just want the inner function_blocks!
- function_blocks: [[['Func1'], ['Func1']], [['Func2'], ['Func2']], [[], []]]
[0]:
[['Func1'], ['Func1']]
- input: ['Func1']
- func_name: 'Func1'
- output: ['Func1']
- func_name: 'Func1'
[1]:
[['Func2'], ['Func2']]
- input: ['Func2']
- func_name: 'Func2'
- output: ['Func2']
- func_name: 'Func2'
[2]: # where does this come from?
[[], []]
- input: []
- output: []
- input_blocks: [[], [], ['Func3']]
[0]: # how do I delete these indexes?
[] # I think I only cleared their contents
[1]:
[]
[2]:
['Func3']
- func_name: 'Func3'
- output_blocks: [[], []]
[0]:
[]
[1]:
[]
输出:
sudo apt install libmysqlcppconn-dev
答案 0 :(得分:1)
此版本的rearrange
解决了我在您的示例中看到的大多数问题:
def rearrange(log_token):
function_blocks = list()
for input_block in log_token.input_blocks:
# look for match among output blocks that have not been cleared
for output_block in filter(None, log_token.output_blocks):
if input_block.func_name == output_block.func_name:
# found two matching blocks! now put them in a function_block
# and clear them from in their original positions in log_token
# create rearranged block, first with a list of the two blocks
# instead of append()'ing, just initialize with a list containing
# the two block copies
tmp_function_block = ParseResults([input_block.copy(), output_block.copy()])
# now assign the blocks by name
# x.__setitem__(key, value) is the same as x[key] = value
tmp_function_block['input'] = tmp_function_block[0]
tmp_function_block['output'] = tmp_function_block[1]
# wrap that all in another ParseResults, as if we had matched a Group
function_block = ParseResults(name='function_blocks', toklist=tmp_function_block, asList=True,
modal=False) # I have no idea what modal and asList do, this was trial-and-error until I got acceptable output
del function_block['input'], function_block['output'] # remove duplicate name references
function_blocks.append(function_block)
# clear blocks in their original positions in log_token, so they won't be matched any more
input_block.clear()
output_block.clear()
# match found, no need to keep going looking for a matching output block
break
# find all input blocks that weren't cleared (had matching output blocks) and append as input-only blocks
for input_block in filter(None, log_token.input_blocks):
# no matching output for this input
tmp_function_block = ParseResults([input_block.copy()])
tmp_function_block['input'] = tmp_function_block[0]
function_block = ParseResults(name='function_blocks', toklist=tmp_function_block, asList=True,
modal=False) # I have no idea what modal and asList do, this was trial-and-error until I got acceptable output
del function_block['input'] # remove duplicate data
function_blocks.append(function_block)
input_block.clear()
# clean out log_token, and reload with rearranged function blocks
log_token.clear()
log_token.extend(function_blocks)
log_token['function_blocks'] = sum(function_blocks)
return log_token
由于这将获取输入令牌并返回重新排列的令牌,因此您可以 将其原样进行解析操作:
# trailing '*' on the results name is equivalent to listAllMatches=True
input_block = Group(Word(alphanums)('func_name') + '_in')('input_blocks*')
output_block = Group(Word(alphanums)('func_name') +'_out')('output_blocks*')
log = OneOrMore(input_block | output_block)
log.addParseAction(rearrange)
由于rearrange
已在log_token
位置进行了更新,因此,如果您将其设为解析动作,则不需要结尾return
语句。
有趣的是,您如何能够通过清除找到匹配的块来就地更新列表-非常聪明。
通常,将令牌组装到ParseResults
中是一个内部函数,因此文档仅涉及此主题。我只是在浏览模块文档,而对于这个主题我并没有真正的好主意。