我正在处理的脚本当前在文件中执行三个正则表达式搜索;考虑以下内容作为输入:
2018-01-22 04.02.03: Wurk: 98745061 (12345678)
Replies (pos: 2) are missing/not sent on assignment: Asdf (55461)
2018-01-22 04.02.03: Wurk: 98885612 (87654321)
Gorp: 98885612 is not registered for arrival!
Brork: 98885612 is not registered for arrival!
2018-01-22 04.02.08: Wurk: 88855521 (885052)
Blam: 12365479 is not registered for arrival!
Fork: 56564123 is not registered for arrival!
2018-01-22 04.02.08: Wurk: A0885521 (885052)
Blam: 12365479 is not registered for arrival!
Fork: 56564123 is not registered for arrival!
其中每个正则表达式根据行的日期以及Wurk:之后的第一个数字查找文件中的行,并在Wurk之后收集八个数字/字符:。
import time, glob, re
logpath = glob.glob('path\\to\\log*.log')[0]
readfile = open(logpath, "r")
daysdate = time.strftime("%Y-%m-%d")
nine = []
eight = []
seven = []
no_match = []
for line in readfile:
for match in re.finditer(daysdate + r'.*Wurk: (9.{7})', line):
nine.append(match.group(1))
for match in re.finditer(daysdate + r'.*Wurk: (8.{7})', line):
eight.append(match.group(1))
for match in re.finditer(daysdate + r'.*Wurk: (7.{7})', line):
seven.append(match.group(1))
print("\nNine:\n%s\n" % ",\n".join(map(str, nine)) +
"\nEight:\n%s\n" % ",\n".join(map(str, eight)) +
"\nSeven:\n%s\n" % ",\n".join(map(str, seven)) +
"\nNo matches found:\n%s\n" % ",\n".join(map(str, no_match)))
目前提供的输出为:
Nine:
98745061,
98885612
Eight:
88855521
Seven:
No matches found:
现在,手头的问题是弄清楚如何制作一个与Wurk:之后的八个数字/字符相匹配的正则表达式,它们在之前的任何正则表达式中都不匹配。因此,新输出应为:
Nine:
98745061,
98885612
Eight:
88855521
Seven:
No matches found:
A0885521
TL; DR
如何匹配与先前正则表达式的条件不匹配的正则表达式?
答案 0 :(得分:2)
正则表达式不打算对数据进行分组;它旨在找到数据。使用正则表达式提取值,然后使用代码对它们进行分组:
seven, eight, nine, no_match = [], [], [], []
wurk_map = {'7': seven,
'8': eight,
'9': nine}
wurks = re.findall(r'(?<=Wurk: ).{8}', text)
for wurk in wurks:
wurk_map.get(wurk[0], no_match).append(wurk)
print(seven) # []
print(eight) # ['88855521']
print(nine) # ['98745061', '98885612']
print(no_match) # ['A0885521']