计算csv中许多列中每列密钥的出现次数

时间:2016-07-08 16:37:32

标签: python csv

有这样的csv

col1,col2,col3
t,t,t
f,f,f
t,f,t

该文件非常大(50 Mb),包含许多列

需要计算每列的t数量

试过这个:

import csv
import collections

col1 = collections.Counter()
with open('file.csv') as input_file:
    for row in csv.reader(input_file, delimiter=','):
        col1[row[0]] += 1

print 'Number of t in col1: %s' % col1['t']

但这仅计算第一列(col1),我如何计算多列?

2 个答案:

答案 0 :(得分:1)

import csv
totals = {}

with open('file.csv') as input_file:
    for row in csv.reader(input_file, delimiter=','):
        for column, cell in enumerate(row):
            if column not in totals:
                totals[column] = 0
            if cell == 't':
                totals[column] += 1

for column in totals:
    print 'column %d has %d trues' % (column, totals[column])

答案 1 :(得分:0)

这将计算第一列中的Ts数。我假设它们都是小写的,但如果不是这样,你可以很容易地做出改变。

t_count = []
with open('file.csv') as f:
    for line in f:
        for col_num, col in enumerate(line.rstrip().split(',')):
            if len(t_count) < col_num + 1:
                t_count.append(0)
            if col == "t":
                t_count[col_num] += 1
print t_coun

  

[2,1,2]

这将告诉每列的Ts数,因此索引0是col1,依此类推......