Question

我的数据如下：

key, time_bin, count
abc, 1, 200
abc, 2,230
abc1,1,300
abc1,2,180
abc2,1, 300
abc2,2, 800

所以每个键都有相同数量的time_bin ..

我想找到以下内容.. 对于每个时间段，它是基于计数的前n个键。

所以，在上面的例子中..让我说我想找出...每个时间段的前2个键是什么？所以..anwer是

1=> [{"abc1",300},{"abc2":300}]
2=> ({"abc2":800},{"abc":230}]

什么是解决这个问题的好方法？

Answer 1

collections.Counter使用collections.defaultdict：

from collections import Counter, defaultdict
import csv

counts = defaultdict(Counter)

with open(somefilename, 'rb') as f:
    reader = csv.reader(f)
    next(reader)  # skip the header
    for row in reader:
        key, time_bin, count = row[0], int(row[1]), int(row[2])
        counts[time_bin][key] += count

for time_bin in counts:
    print '{}=> {}'.format(time_bin, counts[time_bin].most_common(2))

Counter.most_common()方法在这里特别有用;它返回给定计数集的最高计数，此处按时间仓收集。

输出格式几乎与您的示例匹配：

1=> [('abc1', 300), ('abc2', 300)]
2=> [('abc2', 800), ('abc', 230)]

因为.most_common()返回元组列表，而不是字典。

在密钥中找到2d字典中的最大值

1 个答案: