解析多行可变长度日志文件

时间:2010-02-02 05:51:36

标签: regex search parsing logging grep

我希望能够使用'grep'或'pcregrep -M'解决方案来解析符合以下参数的日志文件:

  • 每个日志条目的长度可以是多行
  • 第一行日志条目包含我要搜索的密钥
  • 每个键出现在一行以上

因此,在下面的示例中,我希望返回包含KEY1的每一行以及它下面的所有支持行,直到下一条日志消息。

Log file:
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest
this is a test
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing
test test test
end of key2
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay enough
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on
and on
Desired output of searching for KEY1:
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse

01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay enough

我试图做一些事情:
pcregrep -M'KEY1(。* \ n)+'logfile
但绝对不行。

3 个答案:

答案 0 :(得分:8)

如果您使用* nix,则可以使用shell

#!/bin/bash
read -p "Enter key: " key
awk -vkey="$key" '
$0~/DEBUG/ && $0 !~key{f=0}
$0~key{ f=1 }
f{print} ' file

输出

$ cat file
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah                                       
        blah2 T                                    
        blah3 T                                    
        blah4 F                                    
        blah5 F                                    
        blah6                                      
        blah7                                      
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.758, DEBUG - KEY2:randomtest  
this is a test                                       
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here 
this is another multiline log entry                    
keeps on going                                         
but not as long as before                              
01 Feb 2010 - 10:39:01.763, DEBUG - KEY2:testing       
test test test                                         
end of key2                                            
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going                                                        
and going
and going
and going
okay enough
01 Feb 2010 - 10:39:01.762, DEBUG - KEY3:and so on
and on

$ ./shell.sh
Enter key: KEY1
01 Feb 2010 - 10:39:01.755, DEBUG - KEY1:randomtext
        blah
        blah2 T
        blah3 T
        blah4 F
        blah5 F
        blah6
        blah7
01 Feb 2010 - 10:39:01.757, DEBUG - KEY1:somethngelse
01 Feb 2010 - 10:39:01.760, DEBUG - KEY1:more logs here
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:eve more here
this is another multiline log entry
keeps on going
but not as long as before
01 Feb 2010 - 10:39:01.762, DEBUG - KEY1:but key 1 is still going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
and going
okay enough

答案 1 :(得分:0)

我有类似的要求,并决定编写一个小工具(在.net中),为我解析日志文件并将结果写入标准输出。

也许你觉得它很有用。适用于Windows和Linux(Mono)

见这里:https://github.com/iohn2000/ParLog

用于过滤包含特定(正则表达式)模式的日志条目的日志文件的工具。也适用于多行日志条目。 例如:仅显示某个工作流实例的日志条目。 将结果写入标准输出。使用'>'重定向到文件

默认的startPattern是:

^[0-9]{2} [\w]{3} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}

这对应日期格式:例如:2017年2月4日15:02:50,778

参数是:

f:wildcard      a file name or wildcard for multiple files
p:pattern       the regex pattern to filter the file(s)
s:startPattern  regex pattern to define when a new log entry starts

示例:

ParLog.exe -f=*.log -p=findMe

答案 2 :(得分:-1)

添加到ghostdog74的答案(非常感谢btw,它的效果很好)

现在以“./parse文件密钥”的形式输入命令行,并处理ERROR的日志级别以及DEBUG

#!/bin/bash
awk -vkey="$2" '
$0~/DEBUG|ERROR/ && $0 !~key{f=0}
$0~key{ f=1 }
f{print} ' $1