如何捕获正则表达式匹配项和正则表达式匹配项上方的行并将其发送到文件?

时间:2019-07-17 15:28:13

标签: python regex

我有一个包含服务器IP地址和这些服务器上报告的错误的文件。

我需要捕获那些报告了错误以及错误消息的服务器IP。

使用下面的代码疲倦,但它仅捕获正则表达式匹配项,而不捕获正则表达式上方的行。

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

a=open("log1.txt", 'r')
for line in a:
    if re.match('(\d+)' , line):
        print(line, file=open('output.txt', 'a'))

输入:-

---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

预期输出:-

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED

3 个答案:

答案 0 :(得分:0)

使用itertools.tee在您的输入文件上进行两个迭代器-用它来缓存前一行(用于输出)。

with open("log1.txt") as infile, open("output.txt", 'w') as outfile:
    cache, infile = itertools.tee(infile)
    next(infile, None)
    for err, line in zip(cache, infile):
        if re.match('(\d+)', line):
            print(line, file=outfile)

答案 1 :(得分:0)

我的猜测是,该表达式很可能返回所需的输出:

Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+

使用re.findall

进行测试
import re

regex = r"Errpt report[\s\S]*?(?:\s*\d{10}\s+[A-Z].*)+"

test_str = """
---------------------------------------------------------------------
    Errpt report for 192.1.152.10 ## 

    0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
    Errpt report for 172.11.71.113 ##  

    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 172.1.79.114 ## 

    0717032319 T H ent3 PROBLEM RESOLVED
    0717032319 T H ent2 PROBLEM RESOLVED
    0717032319 T H ent5 PROBLEM RESOLVED
    0717032319 T H ent6 PROBLEM RESOLVED
---------------------------------------------------------------------
    Errpt report for 192.1.119.169 ## 

---------------------------------------------------------------------
    Errpt report for 192.11.119.129 ## 

---------------------------------------------------------------------

"""

print(re.findall(regex, test_str, re.M))

输出

['Errpt report for 192.1.152.10 ## \n\n    0717032319 T H ent2 ETHERNET DOWN', 'Errpt report for 172.11.71.113 ##  \n\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED', 'Errpt report for 172.1.79.114 ## \n\n    0717032319 T H ent3 PROBLEM RESOLVED\n    0717032319 T H ent2 PROBLEM RESOLVED\n    0717032319 T H ent5 PROBLEM RESOLVED\n    0717032319 T H ent6 PROBLEM RESOLVED']

演示

该表达式在regex101.com的右上角进行了解释,如果您想探索/简化/修改它,在this link中,您可以观察到它如何与某些示例输入匹配,如果你喜欢。

RegEx电路

jex.im可视化正则表达式:

enter image description here

答案 2 :(得分:0)

您可以将包含连字符的整行与日志文件的第一行进行匹配,并使用重复模式来匹配以下以10位开头的行。

您可以re.search,而不是使用re.findall来查找正则表达式模式产生匹配项的第一个位置,并将所有匹配项写回到output.txt文件中。

^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+

说明

  • ^字符串的开头
  • -+\r?\n匹配1次以上-,后跟换行符
  • Errpt report for字面上匹配
  • \d{1,3}(?:\.\d{1,3}){3} ##匹配ip之类的模式和空格##
  • [\t ]*匹配0+次空格或制表符
  • (?:非捕获组
    • \r?\n\s*\d{10}匹配换行符,0 +个空格字符和10位数字
    • [ \t].*匹配空格或制表符,并用0+倍除换行符以外的任何字符。
  • )+关闭非捕获组并重复1次以上

Regex demo

例如:

import re

regex = r"^-+\r?\nErrpt report for \d{1,3}(?:\.\d{1,3}){3} ##[\t ]*(?:\r?\n\s*\d{10}[ \t].*)+"

with open ("log1.txt", "r") as log1, open("output.txt", "w") as filteredLog:
    output = re.findall(regex, log1.read(), re.M)
    filteredLog.write("\n".join(output))

结果

---------------------------------------------------------------------
Errpt report for 192.1.152.10 ## 

0717032319 T H ent2 ETHERNET DOWN
---------------------------------------------------------------------
Errpt report for 172.11.71.113 ##  

0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
---------------------------------------------------------------------
Errpt report for 172.1.79.114 ## 

0717032319 T H ent3 PROBLEM RESOLVED
0717032319 T H ent2 PROBLEM RESOLVED
0717032319 T H ent5 PROBLEM RESOLVED
0717032319 T H ent6 PROBLEM RESOLVED