多行的正则表达式组

时间:2015-07-30 17:19:51

标签: regex

需要编写正则表达式,以便日志分组直到下一个 INFO

例如:

INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004
INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681

在正则表达式之后,组将是:

INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004
INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681

2 个答案:

答案 0 :(得分:2)

/INFO(?:\n(?!INFO)|.)*/g

应该这样做:Demo

RegEx与INFO匹配,后跟这个未被捕获的组((?:…)):要么匹配的换行符(\n),然后是另一个INFO(?!INFO))或(|)任意字符(.),任意次(*)。

你认为它应该像“INFO后面跟着任何东西一样重复”,但不幸的是,它会将整个字符串作为一个巨大的结果,因此必须有一个负面的预测: (?!INFO)

答案 1 :(得分:2)

grep命令分别匹配这些块:

grep -zoP '(?s)INFO.+?\n(?=(INFO|$))' file

-o              # capture the match
-z              # to treat the input as a set of lines
-P              # Use PCRE regex
(?s)            # DOTALL to make dot match newlines as well
INFO.+?\n       # match INFO and 1 or more any character (non-greedy) till a new line
(?=\n(INFO|$))  # Lookahead to match until next set of character is INFO or end of file

<强>输出:

INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004

INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681

...