使用grep删除多个字符串实例但保留一个实例

时间:2015-03-10 17:05:22

标签: python grep

我有一个文本文件,其内容如下所示:

[**] [1:1384:8] MISC UPnP malformed advertisement [**]
[Classification: Misc Attack] [Priority: 2] 
03/10-14:33:41.431255 192.168.1.111:40533 -> 255.255.255.255:1900
--
[**] [1:1384:8] MISC UPnP malformed advertisement [**]
[Classification: Misc Attack] [Priority: 2] 
03/10-14:34:11.421186 192.168.1.111:54602 -> 255.255.255.255:1900

[**] [1:1384:8] MISC UPnP malformed advertisement [**]
[Classification: Misc Attack] [Priority: 2] 
03/10-14:34:11.421186 192.168.1.111:54602 -> 255.255.255.255:1900

此文件可以包含尽可能多的此类警报,但是,我希望搜索该文件并仅保留此警报的一个实例,从而使报告文档保持简洁和整洁。

我在想grep搜索[1:1384:8]并删除所有带有此字符串的警报,除了一个。但我不确定如何做到这一点,如果有人知道如何或可以指向我的教程给我看,我也是从Python脚本中做到这一点。

预期输出是取三行的重复部分,只留下三行的一部分。转到上面:

[**] [1:1384:8] MISC UPnP malformed advertisement [**]
[Classification: Misc Attack] [Priority: 2] 
03/10-14:34:11.421186 192.168.1.111:54602 -> 255.255.255.255:1900 

报告中只有一个实例。

3 个答案:

答案 0 :(得分:0)

您可以使用open内置函数:

with open('myfile.txt', 'r') as f:
    file_lines = f.readlines()

cont_lines = []
for line in range(len(file_lines)):
    if "[1:1384:8]" in file_lines[line]:
        cont_lines.append(line)

for idx in cont_lines[1:]: # skip one instance of the string
    file_lines[idx] = "" # replace all others

with open('myfile.txt', 'w') as f:
    f.writelines(file_lines)

除了一个,完全删除该行的所有实例。

车削:

[**] [1:1384:8] MISC UPnP malformed advertisement [**]
[Classification: Misc Attack] [Priority: 2] 
03/10-14:33:41.431255 192.168.1.111:40533 -> 255.255.255.255:1900

分为:

[Classification: Misc Attack] [Priority: 2] 
03/10-14:33:41.431255 192.168.1.111:40533 -> 255.255.255.255:1900

答案 1 :(得分:0)

根据您想要识别每个错误的方式,并假设只是采取最后一节是正常的,那么您可以执行以下操作以在开始时与[**]保持一行,并将以下两行作为剩余部分该部分,例如:

from itertools import islice

with open('data.txt') as fin:# 
    stripped = (line.strip() for line in fin)
    errors = {line:list(islice(fin, 2)) for line in stripped if line.startswith('[**]')}

errors作为dict提供为:

{'[**] [1:1384:8] MISC UPnP malformed advertisement [**]': ['[Classification: Misc Attack] [Priority: 2] \n', '03/10-14:34:11.421186 192.168.1.111:54602 -> 255.255.255.255:1900']}

答案 2 :(得分:0)

如果这些部分总是有三行:

seen = set()
from itertools import islice

with open("in.txt") as f, open("temp.txt", "w") as temp:
    for line in f:
        # use  all [1:1384:8] as identifier
        spl = line.split(None, 2)[1] 
        # if we have not seen one already write the whole section
        if spl not in seen:
            temp.write(line)
            temp.writelines(islice(f, 2))
        else:
            # else skip a section
            list(islice(f, 2))
        seen.add(spl)