我有很多日志文件,其中包含很多muti和单行消息。我想解析这些消息,因此我想从文件中过滤掉所有单独的消息。我试图将日志文件中的所有多行字符串与正则表达式匹配。我无法弄清楚如何使它也匹配字符串中的最后一条消息。每条新消息都以日期开头。以下示例显示了我尝试执行的操作:
import regex as re
multi = """
2015-08-31T23:33:35.423Z INFO: disp24 [ process] (Log.java:124) [toSACLogger] - <?xml version="1.0" encoding="UTF-8"?>
<LifeSignRequest>
<Header>
<MessageTime>2015-08-29T05:41:24.0Z</MessageTime>
<Source>
<ProcessID>008</ProcessID>
</Source>
<Target>
<ProcessID>FSM</ProcessID>
</Target>
</Header>
<Sequence>9298</Sequence>
</LifeSignRequest>
2015-08-31T23:33:35.440Z INFO: disp0 [handleResponse] (HttpClient.java:320) [HttpClient.253_1]no connection or empty contents
2015-08-31T23:33:35.440Z INFO: disp0 [ process] (Log.java:124) [toMCSLogger] - <?xml version="1.0"?>
<LifeSignResponse>
<Header>
<MessageTime>2015-08-31T23:33:35.000Z</MessageTime>
<Source>
<ProcessID>FSM</ProcessID>
</Source>
<Target>
<ProcessID>MCS</ProcessID>
<InstanceID>3006</InstanceID>
</Target>
</Header>
<Signature>9298</Signature>
</LifeSignResponse>
2015-08-31T23:33:37.164Z INFO: disp23 [ process] (Log.java:124) [toSACLogger] - <?xml version="1.0" encoding="UTF-8"?>
<LifeSignRequest>
<Header>
<MessageTime>2015-08-31T23:33:36.0Z</MessageTime>
<Source>
<ProcessID>014</ProcessID>
</Source>
<Target>
<ProcessID>FSM</ProcessID>
</Target>
</Header>
<Sequence>110</Sequence>
</LifeSignRequest>
2015-08-31T23:33:37.189Z INFO: disp8 [handleResponse] (HttpClient.java:320) [HttpClient.253_7]no connection or empty contents
2015-08-31T23:33:37.189Z INFO: disp8 [ process] (Log.java:124) [toMCSLogger] - <?xml version="1.0"?>
<LifeSignResponse>
<Header>
<MessageTime>2015-08-31T23:33:37.000Z</MessageTime>
<Source>
<ProcessID>FSM</ProcessID>
</Source>
<Target>
<ProcessID>MCS</ProcessID>
<InstanceID>3005</InstanceID>
</Target>
</Header>
<Signature>110</Signature>
</LifeSignResponse>
"""
data = re.findall(r'^([0-9]{4}-[0-9]{2}-[0-9]{2}.*?)(?=^[0-9]{4}-[0-9]{2}-[0-9]{2})', multi, re.DOTALL|re.MULTILINE)
for row in data:
print row
上例中的正则表达式将匹配除最后一条消息之外的所有消息。
我的问题是:&#34;如何将上例中字符串中的所有消息与正则表达式匹配?
答案 0 :(得分:1)
^([0-9]{4}-[0-9]{2}-[0-9]{2}.*?)(?=^[0-9]{4}-[0-9]{2}-[0-9]{2}|\Z)
你的表达式无法与最后一个组匹配,因为它使用了一个懒惰的点匹配 - 所有这取决于你的前瞻找到匹配的东西(否则它将是懒惰的并匹配0个字符)。 \Z
被定义为字符串的结尾(因为$
将匹配行的结尾)并且如果没有其他时间戳要查找,则将为延迟匹配提供其他内容。 / p>