如何仅获得不同列中项目的一种出现频率?

时间:2018-06-25 06:00:01

标签: python

我有一个csv文件,如下所示:

(34, 36, 36, 36, 56)

我想集中注意保持(id,Li),其中i = 1、2、3、4作为键,出现频率作为值。我希望输出为列表ID L1 L2 L3 L4 X1 Y1 Z1 1 3 3 1 2 f f x 1 3 3 3 2 g f f 2 3 4 4 3 o p q ,它表示以下内容:

[1, 3, 5]

如果有一个新条目,它将被添加,而旧条目将被计数。

这是我尝试过的:

<1, 3> appeared 5 (i.e. where ever 1 was there 3 appeared in L1 and/or L2 L3 L4)
<1, 1> appeared 1
<1, 2> appeared 2
<2, 3> appeared 2
<2, 4> appeared 2

但这会为每个L1-4列创建单独的值。像import csv import sys from collections import defaultdict from itertools import imap from operator import itemgetter csv.field_size_limit(sys.maxsize) d = defaultdict(lambda: defaultdict(lambda: defaultdict(int))) with open(myfile, 'r') as fi: for item in csv.DictReader(fi): for count in range(1, 5): d[int(item['ID'])]['L'+str(count)][item['L'+str(count)]] += 1 。如何根据ID将整个L1-4视为一个,并计算L1-4频率值?

1 个答案:

答案 0 :(得分:0)

使用“保持(id,Li)作为关键字”这一短语来陈述您的问题做得很好。实际上,您可以使用鲜为人知的Python功能来做到这一点。 Python元组对象是有效的dict键。因此,这将起作用:

counts = defaultdict(int)

# accumulate counts indexed by a tuple (id,Li)
for item in csv.DictReader(fi):
    id = int(item["ID"]) # note 'int()' here and below assumes you actually want the values to be integers, drop it if you want them as strings from csv
    for l in ("L1", "L2", "L3", "L4"):
        counts[ (id, int(item[l])) ] += 1 

# now, all that's left is to convert each entry in counts to the list that we want
for key,item in counts.items():
    lst = list(key) + [item] # list() converts the tuple (id,Li) to [id,li] so we can append the count to that
    print (repr(lst))