我有一个包含服务器IP地址和这些服务器上报告的错误的文件。
我需要捕获那些报告了错误以及错误消息的服务器IP。
使用下面的代码疲倦,但它仅捕获正则表达式匹配项,而不捕获正则表达式上方的行。
a=open("log1.txt", 'r')
for line in a:
if re.match('(\d+)' , line):
print(line, file=open('output.txt', 'a'))
a=open("log1.txt", 'r')
for line in a:
if re.match('(\d+)' , line):
print(line, file=open('output.txt', 'a'))
输入:-
---------------------------------------------------------------------
Errpt report for 192.1.152.10 ##
0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ##
0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 192.1.119.169 ##
---------------------------------------------------------------------
Errpt report for 192.11.119.129 ##
---------------------------------------------------------------------
预期输出:-
---------------------------------------------------------------------
Errpt report for 192.1.152.10 ##
0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ##
0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED
答案 0 :(得分:0)
使用itertools.tee
在您的输入文件上进行两个迭代器-用它来缓存前一行(用于输出)。
with open("log1.txt") as infile, open("output.txt", 'w') as outfile:
cache, infile = itertools.tee(infile)
next(infile, None)
for err, line in zip(cache, infile):
if re.match('(\d+)', line):
print(line, file=outfile)
答案 1 :(得分:0)
我的猜测是,该表达式很可能返回所需的输出:
Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+
re.findall
import re
regex = r"Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+"
test_str = """
---------------------------------------------------------------------
Errpt report for 192.1.152.10 ##
0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ##
0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 192.1.119.169 ##
---------------------------------------------------------------------
Errpt report for 192.11.119.129 ##
---------------------------------------------------------------------
"""
print(re.findall(regex, test_str, re.M))
['Errpt report for 192.1.152.10 ## \n\n 0717032319 T H ent2 ETHERNET DOWN', 'Errpt report for 172.11.71.113 ## \n\n 0717032319 T H ent2 PROBLEM RESOLVED\n 0717032319 T H ent2 PROBLEM RESOLVED', 'Errpt report for 172.1.79.114 ## \n\n 0717032319 T H ent3 PROBLEM RESOLVED\n 0717032319 T H ent2 PROBLEM RESOLVED\n 0717032319 T H ent5 PROBLEM RESOLVED\n 0717032319 T H ent6 PROBLEM RESOLVED']
该表达式在regex101.com的右上角进行了解释,如果您想探索/简化/修改它,在this link中,您可以观察到它如何与某些示例输入匹配,如果你喜欢。
jex.im可视化正则表达式:
答案 2 :(得分:0)
您可以将包含连字符的整行与日志文件的第一行进行匹配,并使用重复模式来匹配以下以10位开头的行。
您可以re.search,而不是使用re.findall来查找正则表达式模式产生匹配项的第一个位置,并将所有匹配项写回到output.txt文件中。
^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+
说明
^
字符串的开头-+\r?\n
匹配1次以上-
,后跟换行符Errpt report for
字面上匹配\d{1,3}(?:\.\d{1,3}){3} ##
匹配ip之类的模式和空格##
[\t ]*
匹配0+次空格或制表符(?:
非捕获组
\r?\n\s*\d{10}
匹配换行符,0 +个空格字符和10位数字[ \t].*
匹配空格或制表符,并用0+倍除换行符以外的任何字符。)+
关闭非捕获组并重复1次以上例如:
import re
regex = r"^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+"
with open ("log1.txt", "r") as log1, open("output.txt", "w") as filteredLog:
output = re.findall(regex, log1.read(), re.M)
filteredLog.write("\n".join(output))
结果
---------------------------------------------------------------------
Errpt report for 192.1.152.10 ##
0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ##
0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED