Python新手 - 原谅任何新手型语法!如果我的代码可以以任何方式提高效率,请同时提供帮助。
我今天的问题是提高输出效率。
我使用的输入文件包含原始数据(来自防火墙日志),如下所示:
loc=15100850|time= 2Dec2016 22:30:17|action=accept|orig=10.10.10.10|i/f_dir=inbound|i/f_name=bond1|product=VPN-1 & FireWall-1|rule=126|rule_uid={10295A8E-3C83-11E5-A372-0A74A015A1A1}|rule_name=DP Syslog|src=10.10.10.10|s_port=51726|dst=10.10.10.10|service=514|proto=udp|__policy_id_tag=product=VPN-1 & FireWall-1[db_tag={CEACB4D6-DAE9-5141-A60F-2913D9FEF3F1};mgmt=CMA-WIN;date=1480655908;policy_name=fw1c-dca_b-win
我编写的Python代码将提取我需要的相关数据并将其格式化为这样(并删除重复项):
10.180.1.1 10.100.100.1 TCP 514
10.180.2.1 10.100.100.1 TCP 514
10.20.20.20 50.50.50.50 TCP 80
10.20.20.20 20.20.20.30 TCP 80
我现在需要做的是根据匹配
的4个中的3个结合这些结果例如:
上面的前两行可以组合起来,因为dest,proto和&服务所有匹配。要写为文件:
10.180.1.1;10.180.2.1 <tab> 10.100.100.1 <tab> TCP 514
上面的第三行和第四行可以合并,因为源,proto&amp;服务所有匹配。要写为文件:
10.20.20.20 <tab> 50.50.50.50;20.20.20.30 <tab> TCP 80
但是,如果找到多个服务对象,则需要将它们写入以逗号分隔的文件,而不是用一个原型分隔的分号....即
10.20.20.20 50.50.50.50 TCP 443,80,8080
这样的事情可能吗?
import re
from sys import argv
script, filename = argv
def connection_list(filename):
try:
with open(filename, "r") as file:
text = file.read()
except IOError:
print(filename, "Does not exist in the current directory. Are you in the correct directory???")
sources = re.findall(r'src=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
dest = re.findall(r'dst=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
service = re.findall(r'service=(\d+)', text)
proto = re.findall(r'proto=(\w+)', text)
proto = [item.upper() for item in proto]
sources = [item.split('=')[1] for item in sources]
dest = [item.split('=')[1] for item in dest]
with open("output.txt", "w") as TufinReq:
for item in zip(sources, dest, proto, service):
TufinReq.write('{}\t{}\t{} {}\n'.format(*item))
f=open("output.txt", "r")
list = []
for line in f:
if line not in list:
list.append(line)
f.close()
f=open("output.txt", "w+")
for line in list:
f.write(line)
f.close()
connection_list(filename)
答案 0 :(得分:0)
您将以自己的方式存储它,但让我们假设输入是一个列表列表:
a = """
10.180.1.1 10.100.100.1 TCP 514
10.180.2.1 10.100.100.1 TCP 514
10.20.20.20 50.50.50.50 TCP 80
10.20.20.20 20.20.20.30 TCP 80
"""
a = [line.strip().split() for line in a.split('\n') if line.strip()]
现在,itertools.groupby
是一种方法:
# use itertools.groupby, with a (hashable version of) all but the first item in each list
import itertools
keyfunc = lambda x: tuple( x[1:] )
grouped = itertools.groupby(sorted(a, key=keyfunc), key=keyfunc)
# There are now a number of ways you could walk through/format the result - here's one:
b = { key:[src for src,dst,service,proto in values] for key, values in grouped }
collections.defaultdict
或下面使用等效方法的代码也是一种方法:
b = {}
for src, dst, service, proto in a:
b.setdefault((dst, service, proto), []).append(src)
无论哪种方式,输入和输出如下所示:
>>> a
[['10.180.1.1', '10.100.100.1', 'TCP', '514'],
['10.180.2.1', '10.100.100.1', 'TCP', '514'],
['10.20.20.20', '50.50.50.50', 'TCP', '80'],
['10.20.20.20', '20.20.20.30', 'TCP', '80']]
>>> b
{('10.100.100.1', 'TCP', '514'): ['10.180.1.1', '10.180.2.1'],
('20.20.20.30', 'TCP', '80'): ['10.20.20.20'],
('50.50.50.50', 'TCP', '80'): ['10.20.20.20']}
请注意,如果保留有关条目顺序的任何信息非常重要,您必须采取其他步骤。