需要编写正则表达式,以便日志分组直到下一个 INFO 。
例如:
INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004
INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681
在正则表达式之后,组将是:
INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004
INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681
答案 0 :(得分:2)
/INFO(?:\n(?!INFO)|.)*/g
应该这样做:Demo。
RegEx与INFO
匹配,后跟这个未被捕获的组((?:…)
):要么匹配不的换行符(\n
),然后是另一个INFO
((?!INFO)
)或(|
)任意字符(.
),任意次(*
)。
你认为它应该像“INFO
后面跟着任何东西一样重复”,但不幸的是,它会将整个字符串作为一个巨大的结果,因此必须有一个负面的预测: (?!INFO)
。
答案 1 :(得分:2)
grep
命令分别匹配这些块:
grep -zoP '(?s)INFO.+?\n(?=(INFO|$))' file
-o # capture the match
-z # to treat the input as a set of lines
-P # Use PCRE regex
(?s) # DOTALL to make dot match newlines as well
INFO.+?\n # match INFO and 1 or more any character (non-greedy) till a new line
(?=\n(INFO|$)) # Lookahead to match until next set of character is INFO or end of file
<强>输出:强>
INFO 2015-07-30 06:50:48,208 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15488'}
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0139939785004
INFO 2015-07-30 06:50:48,571 Request: POST: /api/v1/jobs/
Request Data: {u'job_id': u'15232 195049139026\r\n'}
Exception Raised: NOTFOUND
Resp Status: 200
Resp Data: {'detail': 'ok'}
Resp Time: 0.0570251941681
...