Python csv根据另一列的名称计算一列中的项目

时间:2014-04-09 18:02:40

标签: python csv count

我是Python新手编程的新手。我有一个大的CSV文件(~5k项目)。我需要2列来计算数据。解释我需要的最好的方法是向你展示几行csv:

Name column               OPTIONALDATA5 column 
Collaborative Desk  Broward
Collaborative Desk  Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Broward             Broward
Alachua             Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua

在上面的例子中,我只想得到如下结果:

Broward:
collaborative Desk - 6
Academic Desk - 4
Broward - 1

Alachua:
collaborative Desk - 5
Alachua - 1

也许是一个完整的,然后到电子表格中的下一个库。

我开始编写代码,但我想知道是否有更好的方法来执行此操作。

2 个答案:

答案 0 :(得分:3)

假设数据以制表符分隔,这是获得所需内容的一种方式:

import csv
from collections import defaultdict, Counter

input_file = open('data')
csv_reader = csv.reader(input_file, delimiter='\t')

data = defaultdict(list)
for row in csv_reader:
    data[row[1]].append(row[0])

现在数据将包含:

{'Alachua': ['Alachua', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk'], 
 'Broward': ['Collaborative Desk', 'Collaborative Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Broward']}

您可以迭代每个键的值列表并获取总计数,或者在python中使用Counter方法:

for k, v in data.items():
    print k
    print Counter(v)

打印:

Alachua
Counter({'Collaborative Desk': 5, 'Alachua': 1})
Broward
Counter({'Collaborative Desk': 6, 'Academic Desk': 4, 'Broward': 1})

答案 1 :(得分:1)

这也有效(假设您的文件是\t分隔的):

import itertools
import operator
import csv 
import collections

results = collections.defaultdict(lambda: collections.defaultdict(int))

with open('sample.csv', 'r') as f_in: 
    f_in.seek(0)
    rdr = csv.reader(f_in, delimiter='\t')
    next(rdr)
    for row in rdr:
        results[row[1]][row[0]] += 1

for k, v in results.iteritems():
    print "%s" % k
    for k2, v2 in v.iteritems():
        print "    %s - %s" % (k2, v2)

输出:

Alachua
    Alachua - 1
    Collaborative Desk - 5
Broward
    Collaborative Desk - 6
    Academic Desk - 4
    Broward - 1