所以我有这个CSV文件,其值如下:
Destination Address Count Device Action
10.0.0.1 5 accept
10.0.0.2 4 deny
10.0.0.3 6 accept
10.0.0.2 8 accept
10.0.0.3 6 deny
10.0.0.1 2 accept
等等。(我省略了csv文件中的其他列)
我编写了以下代码来总结接受操作的每个唯一地址的计数并返回[IP,sum]值:
with open(r'C:\Users\Traffic to blacklisted IP.csv') as oublip:
oublip_list = list(csv.DictReader(oublip))
action=[column['Device Action'] for column in oublip_list]
daddr1 = set(b['Destination Address'] for b in oublip_list)
daddrl1 = [column['Destination Address'] for column in oublip_list]
sumcount1 = [int(column['Sum']) for column in oublip_list]
daddr1o = []
for act in action:
if act=='accept':
for daddr in daddr1:
sum=0
for index, c in enumerate(daddrl1):
if c==daddr:
sum=sum+sumcount1[index]
daddr1o.append([daddr, sum])
daddr2o = [list(t) for t in set(map(tuple, daddr1o))]
daddr2o.sort(key=lambda x: (x[1]), reverse=True)
print(daddr2o)
有更好的方法吗?
答案 0 :(得分:0)
导入defaultdict 不要用cvsreader,如果你没有,你应该只需要拆分。使用哈希将ip地址映射到唯一计数,使用defualtdict在新IP地址上自动初始化计数为0。
counts = defaultdict(int)
with open(r'C:\Input\traffic to IP.csv', "r") as csvfile:
for line in csvfile.readlines() :
l = line.split()
if l[2] == 'accept' :
counts[l[0]] += int(l[1])
print "\n".join([ "%s : %d"%(k,v)
for k,v in counts.iteritems()])
答案 1 :(得分:0)
也许这就是你之后的事情:
import csv
def get_device_counts(event_list):
device_counts = {}
for event in event_list:
if event['Device Action'] != 'accept':
continue
if event['Destination Address'] not in device_counts:
device_counts[event['Destination Address']] = 0
device_counts[event['Destination Address']] += int(event['Sum'])
return device_counts
def print_device_counts(device_counts):
for key in sorted(device_counts, key=device_counts.get, reverse=True):
value = device_counts[key]
print('%s: %d' % ( key, value ))
event_list = csv.DictReader(open('data.csv', 'r'))
device_counts = get_device_counts(event_list)
print_device_counts(device_counts)
使用以下数据:
Destination Address,Sum,Device Action
10.0.0.1,5,accept
10.0.0.2,4,deny
10.0.0.3,6,accept
10.0.0.2,8,accept
10.0.0.3,6,deny
10.0.0.1,2,accept
提供以下输出:
10.0.0.2: 8
10.0.0.1: 7
10.0.0.3: 6
答案 2 :(得分:0)
我相信你们都在努力工作。 pandas
完美地处理了这个问题:
import pandas
df = read_csv('YOUR_PATH_HERE', sep='WHATEVER_SEP_U_USE')
print(df['COLUMN_NAME_HERE'].unique().shape[0])
# or
print(df['COLUMN_NAME_HERE'].value_counts())
示例:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [1,1,3,4,5,5,3,1,5,np.NaN],
'B' : [1,1,3,5,0,0,np.NaN,9,0,0],
'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(df)
# use code below to get the number of unique values
print(df['A'].unique().shape[0])
#output is 5
# use this code below for the count for each unique value
print(df['A'].value_counts())
#output below
5.0 3
1.0 3
3.0 2
4.0 1
希望这有帮助!
修改强>: 根据某一列总结计数:
df2= df[df['Action']=='Accept']
df3 = df2.groupby('Destination Address').sum()
print(df3['Count'])
您可能需要稍微更改列名