我有一个csv文件,其中包含以下数据。
select contract, sum(case when rank = 1 and diffdat < 30 then 1 else 0 end),
sum( case when rank = 1 and diffdat > 30 then 1 else 0 end)
from dat group by contract
到目前为止,我已经能够删除重复项,现在我需要删除 src = dest && dest = source && message == message的行 如果src = src && dest = dest || src = dest && dest = source &&的那些,如果它们的=标记为“ infected”,则删除带有“ other”的那些 基本上将它们视为相同的连接 到目前为止,这是我要删除的重复内容
192.168.136.192,2848,100.100.100.212,6667,"other"
100.100.100.212,6667,192.168.136.192,2848,"other"
100.100.100.212,6667,192.168.136.192,2848,"CHAT IRC message"
192.168.61.74,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"other"
192.168.118.168,4662,69.192.30.179,80,"infection"
192.168.118.168,4662,69.192.30.179,80,"other"
192.168.110.111,4662,69.192.30.179,80,"infection"
基本上
with open(r'alerts.csv','r') as in_file, open('alertsfix.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
if line in seen: continue # skip duplicate
seen.add(line)
out_file.write(line)
条件:
src/prt/dest/prt/msg
1. a/a1/b/b1/c
2. 2a/2a1/2b/2b1/2c
或
if a==2b && a1==2b1 && b==2a && b1==2a1 c==2c
delete one of them being they are equal
我是python的新手,任何指导将不胜感激
答案 0 :(得分:0)
首先,您必须定义平等的条件。例如,以下代码仅在同时满足两个条件的情况下才将行视为相等:
frozenset
添加两个地址,因此顺序无关紧要。您可以使用frozenset
(内置的不可修改集合)为每一行构建键,以实现seen
集合中的查找:
with open('alerts.csv','r') as in_file, open('alertsfix.csv','w') as out_file:
seen = set()
for line in in_file:
line = line.strip()
if len(line) > 0:
src_ip, src_port, dst_ip, dst_port, msg = line.split(',')
src = '{}:{}'.format(src_ip, src_port)
dst = '{}:{}'.format(dst_ip, dst_port)
key = frozenset([
frozenset([src, dst]),
msg,
])
if key not in seen:
seen.add(key) # we add 'key' to the set
out_file.write(line) # we write 'line' to the new file
这是否有助于您完成任务?