查找多个列表中的匹配项,并根据4项中的3项组合匹配项

时间:2016-12-05 18:57:32

标签: python list

Python新手 - 原谅任何新手型语法!如果我的代码可以以任何方式提高效率,请同时提供帮助。

我今天的问题是提高输出效率。

我使用的输入文件包含原始数据(来自防火墙日志),如下所示:

loc=15100850|time= 2Dec2016 22:30:17|action=accept|orig=10.10.10.10|i/f_dir=inbound|i/f_name=bond1|product=VPN-1 & FireWall-1|rule=126|rule_uid={10295A8E-3C83-11E5-A372-0A74A015A1A1}|rule_name=DP Syslog|src=10.10.10.10|s_port=51726|dst=10.10.10.10|service=514|proto=udp|__policy_id_tag=product=VPN-1 & FireWall-1[db_tag={CEACB4D6-DAE9-5141-A60F-2913D9FEF3F1};mgmt=CMA-WIN;date=1480655908;policy_name=fw1c-dca_b-win

我编写的Python代码将提取我需要的相关数据并将其格式化为这样(并删除重复项):

10.180.1.1      10.100.100.1    TCP 514
10.180.2.1      10.100.100.1    TCP 514
10.20.20.20     50.50.50.50     TCP 80
10.20.20.20     20.20.20.30     TCP 80

我现在需要做的是根据匹配

的4个中的3个结合这些结果

例如:

上面的前两行可以组合起来,因为dest,proto和&服务所有匹配。要写为文件:

10.180.1.1;10.180.2.1 <tab> 10.100.100.1 <tab> TCP 514

上面的第三行和第四行可以合并,因为源,proto&amp;服务所有匹配。要写为文件:

10.20.20.20 <tab> 50.50.50.50;20.20.20.30 <tab> TCP 80

但是,如果找到多个服务对象,则需要将它们写入以逗号分隔的文件,而不是用一个原型分隔的分号....即

10.20.20.20 50.50.50.50 TCP 443,80,8080

这样的事情可能吗?

import re
from sys import argv

script, filename = argv

def connection_list(filename):
    try:
        with open(filename, "r") as file:
            text = file.read()
    except IOError:
        print(filename, "Does not exist in the current directory. Are you in the correct directory???")

    sources = re.findall(r'src=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
    dest = re.findall(r'dst=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
    service = re.findall(r'service=(\d+)', text)
    proto = re.findall(r'proto=(\w+)', text)

    proto = [item.upper() for item in proto]
    sources = [item.split('=')[1] for item in sources]
    dest = [item.split('=')[1] for item in dest]

    with open("output.txt", "w") as TufinReq:
        for item in zip(sources, dest, proto, service):
            TufinReq.write('{}\t{}\t{} {}\n'.format(*item))

    f=open("output.txt", "r")
    list = []
    for line in f:
        if line not in list:
            list.append(line)
    f.close()
    f=open("output.txt", "w+")
    for line in list:
        f.write(line)
    f.close()

connection_list(filename)

1 个答案:

答案 0 :(得分:0)

您将以自己的方式存储它,但让我们假设输入是一个列表列表:

a = """
    10.180.1.1      10.100.100.1    TCP 514
    10.180.2.1      10.100.100.1    TCP 514
    10.20.20.20     50.50.50.50     TCP 80
    10.20.20.20     20.20.20.30     TCP 80
"""
a = [line.strip().split() for line in a.split('\n') if line.strip()]

现在,itertools.groupby是一种方法:

# use itertools.groupby, with a (hashable version of) all but the first item in each list
import itertools
keyfunc = lambda x: tuple( x[1:] )
grouped = itertools.groupby(sorted(a, key=keyfunc), key=keyfunc)

# There are now a number of ways you could walk through/format the result - here's one:
b = { key:[src for src,dst,service,proto in values]  for key, values in grouped }

collections.defaultdict或下面使用等效方法的代码也是一种方法:

b = {}
for src, dst, service, proto in a:
    b.setdefault((dst, service, proto), []).append(src)

无论哪种方式,输入和输出如下所示:

>>> a
[['10.180.1.1', '10.100.100.1', 'TCP', '514'],
 ['10.180.2.1', '10.100.100.1', 'TCP', '514'],
 ['10.20.20.20', '50.50.50.50', 'TCP', '80'],
 ['10.20.20.20', '20.20.20.30', 'TCP', '80']]

>>> b
{('10.100.100.1', 'TCP', '514'): ['10.180.1.1', '10.180.2.1'],
 ('20.20.20.30', 'TCP', '80'): ['10.20.20.20'],
 ('50.50.50.50', 'TCP', '80'): ['10.20.20.20']}

请注意,如果保留有关条目顺序的任何信息非常重要,您必须采取其他步骤。