读取csv文件,求和特定列,返回值

时间:2017-03-15 13:09:06

标签: python list csv

所以我有这个CSV文件,其值如下:

Destination Address  Count   Device Action
10.0.0.1             5       accept
10.0.0.2             4       deny
10.0.0.3             6       accept
10.0.0.2             8       accept
10.0.0.3             6       deny
10.0.0.1             2       accept

等等。(我省略了csv文件中的其他列)

我编写了以下代码来总结接受操作的每个唯一地址的计数并返回[IP,sum]值:

with open(r'C:\Users\Traffic to blacklisted IP.csv') as oublip:
    oublip_list = list(csv.DictReader(oublip))    
action=[column['Device Action'] for column in oublip_list]
daddr1 = set(b['Destination Address'] for b in oublip_list)
daddrl1 = [column['Destination Address'] for column in oublip_list]
sumcount1 = [int(column['Sum']) for column in oublip_list]
daddr1o = []
for act in action:
    if act=='accept':
        for daddr in daddr1:
            sum=0
            for index, c in enumerate(daddrl1):
                if c==daddr:
                    sum=sum+sumcount1[index]
            daddr1o.append([daddr, sum])
daddr2o = [list(t) for t in set(map(tuple, daddr1o))]
daddr2o.sort(key=lambda x: (x[1]), reverse=True)
print(daddr2o)

有更好的方法吗?

3 个答案:

答案 0 :(得分:0)

来自集合的

导入defaultdict 不要用cvsreader,如果你没有,你应该只需要拆分。使用哈希将ip地址映射到唯一计数,使用defualtdict在新IP地址上自动初始化计数为0。

counts = defaultdict(int)
with open(r'C:\Input\traffic to IP.csv', "r") as csvfile:
  for line in csvfile.readlines() :
    l = line.split()
    if l[2] == 'accept' :
      counts[l[0]] += int(l[1])

print "\n".join([ "%s : %d"%(k,v) 
        for k,v in counts.iteritems()])

答案 1 :(得分:0)

也许这就是你之后的事情:

import csv

def get_device_counts(event_list):
    device_counts = {}
    for event in event_list:
        if event['Device Action'] != 'accept':
            continue
        if event['Destination Address'] not in device_counts:
            device_counts[event['Destination Address']] = 0
        device_counts[event['Destination Address']] += int(event['Sum'])
    return device_counts

def print_device_counts(device_counts):
    for key in sorted(device_counts, key=device_counts.get, reverse=True):
        value = device_counts[key]
        print('%s: %d' % ( key, value ))

event_list = csv.DictReader(open('data.csv', 'r'))
device_counts = get_device_counts(event_list)
print_device_counts(device_counts)

使用以下数据:

Destination Address,Sum,Device Action
10.0.0.1,5,accept
10.0.0.2,4,deny
10.0.0.3,6,accept
10.0.0.2,8,accept
10.0.0.3,6,deny
10.0.0.1,2,accept

提供以下输出:

10.0.0.2: 8
10.0.0.1: 7
10.0.0.3: 6

答案 2 :(得分:0)

我相信你们都在努力工作。 pandas完美地处理了这个问题:

import pandas
df = read_csv('YOUR_PATH_HERE', sep='WHATEVER_SEP_U_USE')
print(df['COLUMN_NAME_HERE'].unique().shape[0])
# or
print(df['COLUMN_NAME_HERE'].value_counts())

示例:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [1,1,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,1,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['AA1233445','AA1233445', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
print(df)
# use code below to get the number of unique values
print(df['A'].unique().shape[0])
#output is 5


# use this code below for the count for each unique value 
print(df['A'].value_counts())
#output below 

5.0    3
1.0    3
3.0    2
4.0    1

希望这有帮助!

修改: 根据某一列总结计数:

df2= df[df['Action']=='Accept']
df3 = df2.groupby('Destination Address').sum()
print(df3['Count'])

您可能需要稍微更改列名