这是我的代码:我有一个脚本可以读取文件,但在我的文件中并不是所有的行都相似,我只想从I DOC O:
的行中提取信息。
我尝试使用if条件但是当有正则表达式不匹配的行时它仍然无效:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
def extraire(data):
ms = re.match(r'(\S+).*?(O:\S+).*(R:\S+).*mid:(\d+)', data) # heure & mid
return {'Heure':ms.group(1), 'mid':ms.group(2),"Origine":ms.group(3),"Destination":ms.group(4)}
tableau = []
fichier = open("/home/TEST/file.log")
f = fichier.readlines()
for line in f:
if (re.findall(".*I Doc O:.*",line)):
tableau = [extraire(line) for line in f ]
print tableau
fichier.close()
以下是我文件中某些行的示例,我想要第一行和第四行..:
01:09:25.258 mta Messages I Doc O:NVS:SMTP/alarm@yyy.xx R:NVS:SMS/+654811 mid:6261
01:09:41.965 mta Messages I Rep O:NVS:SMTP/alarmes.techniques@xxx.de R:NVS:SMS/+455451 mid:6261
01:09:41.965 mta Messages I Rep 6261 OK, Accepted (ID: 26)
08:14:14.469 mta Messages I Doc O:NVS:SMTP/alarm@xxxx.en R:NVS:SMS/+654646 mid:6262
08:14:30.630 mta Messages I Rep O:NVS:SMTP/alarm@azea.er R:NVS:SMS/+33688704859 mid:6262
08:14:30.630 mta Messages I Rep 6262 OK, Accepted (ID: 28)
答案 0 :(得分:0)
来自:http://docs.python.org/2/library/re.html
?,+?,?? '','+'和'?'资格赛都是贪心的;它们匹配尽可能多的文本。有时这种行为是不可取的;如果RE<。*>匹配...
此外,findall最好用于整个缓冲区,并返回一个列表,因此循环匹配可以使您不必对文件的每一行进行条件化。
buff = fichier.read()
matches = re.findall(".*?I Doc ):.*", buff)
for match in matches:
tableau = ...
- 这是我的测试代码,你能告诉我它在做什么,你不想要的吗?
>>> import re
>>> a = """
... 01:09:25.258 mta Messages I Doc O:NVS:SMTP/alarm@yyy.xx R:NVS:SMS/+654811 mid:6261
... 01:09:41.965 mta Messages I Rep O:NVS:SMTP/alarmes.techniques@xxx.de R:NVS:SMS/+455451 mid:6261
... 01:09:41.965 mta Messages I Rep 6261 OK, Accepted (ID: 26)
... 08:14:14.469 mta Messages I Doc O:NVS:SMTP/alarm@xxxx.en R:NVS:SMS/+654646 mid:6262
... 08:14:30.630 mta Messages I Rep O:NVS:SMTP/alarm@azea.er R:NVS:SMS/+33688704859 mid:6262
... 08:14:30.630 mta Messages I Rep 6262 OK, Accepted (ID: 28)"""
>>> m = re.findall(".*?I Doc O:.*",a)
['01:09:25.258 mta Messages I Doc O:NVS:SMTP/alarm@yyy.xx R:NVS:SMS/+654811 mid:6261', '08:14:14.469 mta Messages I Doc O:NVS:SMTP/alarm@xxxx.en R:NVS:SMS/+654646 mid:6262']
>>> tableau = []
>>> for line in m:
... tableau.append( extraire(line) )
...
>>> tableau
[{'Origine': 'R:NVS:SMS/+654811', 'Destination': '6261', 'Heure': '01:09:25.258', 'mid': 'O:NVS:SMTP/alarm@yyy.xx'}, {'Origine': 'R:NVS:SMS/+654646', 'Destination': '6262', 'Heure': '08:14:14.469', 'mid': 'O:NVS:SMTP/alarm@xxxx.en'}]
您也可以在一行中执行此操作
>>> tableau = [ extraire(line) for line in re.findall( ".*?I Doc ):.*", fichier.read() ) ]