CSV排序和删除Python

时间:2019-03-09 18:06:18

标签: python csv

我有一个csv文件,其中包含以下数据。

   select contract, sum(case when rank = 1 and diffdat < 30 then 1 else 0 end),
           sum( case when rank = 1 and diffdat > 30 then 1 else 0 end)
      from dat group by contract

到目前为止,我已经能够删除重复项,现在我需要删除 src = dest && dest = source && message == message的行 如果src = src && dest = dest || src = dest && dest = source &&的那些,如果它们的=标记为“ infected”,则删除带有“ other”的那些 基本上将它们视为相同的连接 到目前为止,这是我要删除的重复内容

192.168.136.192,2848,100.100.100.212,6667,"other"
100.100.100.212,6667,192.168.136.192,2848,"other"
100.100.100.212,6667,192.168.136.192,2848,"CHAT IRC message"
192.168.61.74,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"other"
192.168.118.168,4662,69.192.30.179,80,"infection"
192.168.118.168,4662,69.192.30.179,80,"other"
192.168.110.111,4662,69.192.30.179,80,"infection"

基本上

with open(r'alerts.csv','r') as in_file, open('alertsfix.csv','w') as     out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
    if line in seen: continue # skip duplicate

    seen.add(line)
    out_file.write(line)

条件:

src/prt/dest/prt/msg
1. a/a1/b/b1/c
2. 2a/2a1/2b/2b1/2c

if a==2b && a1==2b1 && b==2a && b1==2a1 c==2c
    delete one of them being they are equal 

我是python的新手,任何指导将不胜感激

1 个答案:

答案 0 :(得分:0)

首先,您必须定义平等的条件。例如,以下代码仅在同时满足两个条件的情况下才将行视为相等:

  • 两个参与地址(ip和post都相同);我使用frozenset添加两个地址,因此顺序无关紧要。
  • 消息是一样的

您可以使用frozenset(内置的不可修改集合)为每一行构建键,以实现seen集合中的查找:

with open('alerts.csv','r') as in_file, open('alertsfix.csv','w') as out_file:
    seen = set()
    for line in in_file:
        line = line.strip()
        if len(line) > 0:
            src_ip, src_port, dst_ip, dst_port, msg = line.split(',')
            src = '{}:{}'.format(src_ip, src_port)
            dst = '{}:{}'.format(dst_ip, dst_port)
            key = frozenset([
                frozenset([src, dst]),
                msg,
            ])

            if key not in seen:
                seen.add(key)         # we add 'key' to the set
                out_file.write(line)  # we write 'line' to the new file

这是否有助于您完成任务?