Python新手......
如果我有一个输出文件(防火墙日志文件),如下所示:
(source) (dest) (proto) (service)
10.10.10.1 20.20.20.1 TCP 80
10.10.10.1 30.30.30.1 TCP 80
10.10.10.1 40.40.40.1 TCP 514
10.10.10.1 40.40.40.1 TCP 443
我需要根据匹配的4个中的3个对这些数据进行分组。所以基于上面的输出,我需要将其写入一个看起来像
的新文件10.10.10.1 20.20.20.1;30.30.30.1 TCP 80
OR
10.10.10.1 40.40.40.1 TCP 514, 443
(请注意使用分号分隔IP地址,在第二行使用逗号分隔服务对象)
我已经查看了python groupby方法,但我无法正确理解
所以用英语(在我脑海里):
for every line in the file,
if source and/or dest and/or proto, and/or service match any other line in
line in the file
combine on one line and write to file (with semicolon if separting IP
addresses and a comma if separating service objects)
import re
from itertools import groupby
from sys import argv
#Written by Clyde Colbert - f7cmb14
script, filename = argv
data = []
def connection_list(filename):
try:
with open(filename, "r") as file:
text = file.read()
except IOError:
print(filename, "Does not exist in the current directory. Are you in the correct directory???")
sources = re.findall(r'src=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
dest = re.findall(r'dst=(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})', text)
service = re.findall(r'service=(\d+)', text)
proto = re.findall(r'proto=(\w+)', text)
proto = [item.upper() for item in proto]
sources = [item.split('=')[1] for item in sources]
dest = [item.split('=')[1] for item in dest]
with open(filename + "OUTPUT.txt", "w") as TufinReq:
for item in zip(sources, dest, proto, service):
TufinReq.write('{}\t{}\t{} {}\n'.format(*item))
f=open(filename + "OUTPUT.txt", "r")
list = []
for line in f:
if line not in list:
list.append(line)
f.close()
f=open(filename + "OUTPUT.txt", "w+")
for line in list:
f.write(line)
f.close()
f=open(filename + "Output.txt", "r")
for line in f:
data.append(line)
cols = (0,2,3)
def getcolumns(cols):
cols = (0,2,3)
def f(row):
return tuple(row[i] for i in cols)
return f
for k, v in groupby(data, getcolumns(cols)):
print(k, list(v))
connection_list(filename)
答案 0 :(得分:0)
groupby(iterable, keyfunc)
的工作原理是对具有相同key
的项目进行分组(keyfunc
返回的值。
要完成任务,您可以让keyfunc返回一行中的多个项目。
为简单起见,我们假设您已经拥有以下格式的数据:
data=[
('10.10.10.1', '20.20.20.1', 'TCP', '80'),
('10.10.10.1', '30.30.30.1', 'TCP', '80'),
('10.10.10.1', '40.40.40.1', 'TCP', '514'),
('10.10.10.1', '40.40.40.1', 'TCP', '443')
]
因此,如果您想查看源,原型和服务匹配的行(列索引0,2和3),您可以创建这些列的键。
让我们写一个小的闭包来提取那些列(它将是你的keyfunc):
def getcolumns(cols):
def f(row):
return tuple(row[i] for i in cols)
return f
让我们看看你得到的结果:
>>> cols = (0,2,3)
>>> data.sort(key=getcolumns(cols))
>>> for k, v in groupby(data, getcolumns(cols)):
... print(k, list(v))
...
('10.10.10.1', 'TCP', '80') [('10.10.10.1', '20.20.20.1', 'TCP', '80'), ('10.10.10.1', '30.30.30.1', 'TCP', '80')]
('10.10.10.1', 'TCP', '514') [('10.10.10.1', '40.40.40.1', 'TCP', '514')]
('10.10.10.1', 'TCP', '443') [('10.10.10.1', '40.40.40.1', 'TCP', '443')]
您可能希望排除石斑鱼长度为1(无匹配)的结果:
>>> cols = (0,2,3)
>>> data.sort(key=getcolumns(cols))
>>> for k, v in groupby(data, getcolumns(cols)):
... v = list(v)
... if len(v) == 1: continue
... print(k, v)
...
('10.10.10.1', 'TCP', '80') [('10.10.10.1', '20.20.20.1', 'TCP', '80'), ('10.10.10.1', '30.30.30.1', 'TCP', '80')]
现在只需要一点处理就可以将其转换为您正在寻找的输出格式:
>>> cols = (0,2,3)
>>> data.sort(key=getcolumns(cols))
>>> for k, v in groupby(data, getcolumns(cols)):
... v = list(v)
... if len(v) == 1: continue
... print(*(';'.join(set(r[i] for r in v)) for i in range(len(v[0]))))
...
10.10.10.1 20.20.20.1;30.30.30.1 TCP 80
(如果你想使用这个实现但是你想保留行的顺序,请使用OrderedSet)