使用正则表达式匹配多行日志文件中的每条消息

时间:2017-07-01 12:50:18

标签: java regex logging stack-trace

我有这个多行日志文件:

INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1
DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2 
  that is multiline!
WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message
ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...
my.packkageName.MyException: exception!
   at my.packkageName.Class4.process(Class4.java:11)
   at ...
INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is another INFO message 

我想要一个匹配日志中每条消息的正则表达式,这样:

group 1: INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1

group 2: DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2 
  that is multiline!

group 3: WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message

group 4: ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...
my.packkageName.MyException: exception!
   at my.packkageName.Class4.process(Class4.java:11)
   at ...

此正则表达式仅适用于单行消息:

(?:ERROR|DEBUG|INFO|WARN).++

2 个答案:

答案 0 :(得分:1)

我找到了解决方案。

要使用的正则表达式如下:

/(?:DEBUG|INFO|ERROR|WARN)[\s\S]+?(?=DEBUG|INFO|WARN|ERROR)/gm

以多行方式匹配单词DEBUG,INFO,ERROR或WARN之间包含的每条“日志消息”。

答案 1 :(得分:0)

将日志文件加载到字符串中并使用正则表达式查找消息可能不是处理大型日志文件的最有效方法。

但是如果你对正则表达式很好并且想要得到最后一条消息那么你可以这样做:

 String logstr = "INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is the message 1\n"
 + "DEBUG 2017-07-01 12:01:56,987 [Thread-1] Class2:15 This is the message 2 \n"
 + "  that is multiline!\n"
 + "WARN 2017-07-01 12:01:56,987 [Thread-1] Class3:15 This is a warn message\n"
 + "ERROR 2017-07-01 12:01:56,987 [Thread-1] Class4:15 This is an error with the stacktrace...\n"
 + "my.packkageName.MyException: exception!\n"
 + "   at my.packkageName.Class4.process(Class4.java:11)\n"
 + "   at ...\n"
 + "INFO 2017-07-01 12:01:56,987 [Thread-1] Class1:15 This is another INFO message ";

final Pattern pattern = Pattern.compile("^([A-Z]{4,}).+?(?=(?:^[A-Z]{4}|\\z))", Pattern.DOTALL | Pattern.MULTILINE);
Matcher messages = pattern.matcher(logstr);

while (messages.find()) {
  System.out.println("---"+ messages.group(1));      
  System.out.println(messages.group(0)); 
}

由于Pattern.DOTALL,.*也匹配行终止符。

使用Pattern.MULTILINE,^在任何行终止符之后也匹配,除了在输入结束时。

\z标志着输入的结束。