考虑以下Tomcat日志结构:
[06/Feb/2013:15:25:27 +0000] [Thread-10] DEBUG xxx.yyy.xxx.yyy.xxx.yyy.BlahBlahClass - Reloading blah configuration: /somepath/xxx.yyy
[06/Feb/2013:15:25:27 +0000] [Thread-11] ERROR xxx.yyy.xxx.yyy.xxx.yyy.BlahBlahClass2 - [xxx.yyy] - Could not find the somethinh
[06/Feb/2013:15:25:27 +0000] [Thread-12] ERROR xxx.yyy.xxx.yyy.xxx.yyy - error handling product : xxx.yyy don't know where it is
xxx.yyy.IOException: Could not find the feed with id [thisisfeedname_13601429613239870] in the feed repository or as a what?
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:57)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:65)
at xxx.yyy.xxx.yyy.xxx.yyy.flush(xxx.yyy:294)
at xxx.yyy.xxx.yyy.DelayedLogger$xxx.yyy(Unknown Source)
Caused by: xxx.yyy.FileNotFoundException: /path/to/feeds/xxx.yyy (No such file or directory)
at xxx.yyy.xxx.yyy(Native Method)
at xxx.yyy.FileInputStream.<init>(xxx.yyy:120)
at xxx.yyy.xxx.yyy.xxx.yyy.parse(xxx.yyy:248)
at xxx.yyy.xxx.yyy$xxx.yyy(Unknown Source)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:41)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:13)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:54)
at xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:176)
at xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:151)
at xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:143)
at xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:127)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:63)
at xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy(xxx.yyy:43)
... 3 more
[06/Feb/2013:15:25:27 +0000] [Thread-13] INFO xxx.yyy.xxx.yyy.xxx.yyy - constructing a new CSV feed resource
[06/Feb/2013:15:25:27 +0000] [Thread-14] DEBUG xxx.yyy.xxx.yyy.xxx.yyy.xxx.yyy - number of feeds defined for the resource: 1
[06/Feb/2013:15:25:27 +0000] [Thread-15] INFO xxx.yyy.xxx.yyy.xxx.yyy - constructing a new CSV feed resource
日志由一个报告行组成,该行以大括号中的时间戳开头,可选地后跟堆栈跟踪。例如,Thread-12
具有以下堆栈跟踪,但是线程10到15没有。
我希望将每个日志事件转换为具有时间戳,错误类型(ERROR
,INFO
等),消息和可选堆栈跟踪的Python对象。我试过以下正则表达式:
reg_str='^\[(.*?)\]\s+\[(.*?)\]\s+(\w+)\s*(.*)\s*$\s*(([^\[].*?$)*)'
reg=re.compile(reg_str, re.MULTILINE)
Alas,只要有堆栈跟踪,正则表达式会贪婪地匹配文本直到日志结束。
如何重写正则表达式以正确匹配日志事件?
答案 0 :(得分:2)
首先,让贪婪的部分不贪婪;)
^\[(.*?)\]\s+\[(.*?)\]\s+(\w+)\s*(.*?)\s*$\s*(([^\[].*?$)*)
^
但是,您可以在链接中看到正则表达式的其余部分存在问题。首先,您需要做的是将最后的\s*
放在括号内,因为可选行可以缩进。其次,你需要使用否定的前瞻而不是否定的字符类,原因可能会使这个答案复杂化(但我可以在评论中,如果你愿意的话)。像这样:
^\[(.*?)\]\s+\[(.*?)\]\s+(\w+)\s*(.*?)\s*$((\s*(?!\[).*?$)*)
^^^^^^^^^^^
最后,最后一个捕获组并不是特别有用,所以......
^\[(.*?)\]\s+\[(.*?)\]\s+(\w+)\s*(.*?)\s*$((?:\s*(?!\[).*?$)*)
^^