我有一个csv文件,如下所示:
(34, 36, 36, 36, 56)
我想集中注意保持(id,Li),其中i = 1、2、3、4作为键,出现频率作为值。我希望输出为列表ID L1 L2 L3 L4 X1 Y1 Z1
1 3 3 1 2 f f x
1 3 3 3 2 g f f
2 3 4 4 3 o p q
,它表示以下内容:
[1, 3, 5]
如果有一个新条目,它将被添加,而旧条目将被计数。
这是我尝试过的:
<1, 3> appeared 5 (i.e. where ever 1 was there 3 appeared in L1 and/or L2 L3 L4)
<1, 1> appeared 1
<1, 2> appeared 2
<2, 3> appeared 2
<2, 4> appeared 2
但这会为每个L1-4列创建单独的值。像import csv
import sys
from collections import defaultdict
from itertools import imap
from operator import itemgetter
csv.field_size_limit(sys.maxsize)
d = defaultdict(lambda: defaultdict(lambda: defaultdict(int)))
with open(myfile, 'r') as fi:
for item in csv.DictReader(fi):
for count in range(1, 5):
d[int(item['ID'])]['L'+str(count)][item['L'+str(count)]] += 1
。如何根据ID将整个L1-4视为一个,并计算L1-4频率值?
答案 0 :(得分:0)
使用“保持(id,Li)作为关键字”这一短语来陈述您的问题做得很好。实际上,您可以使用鲜为人知的Python功能来做到这一点。 Python元组对象是有效的dict键。因此,这将起作用:
counts = defaultdict(int)
# accumulate counts indexed by a tuple (id,Li)
for item in csv.DictReader(fi):
id = int(item["ID"]) # note 'int()' here and below assumes you actually want the values to be integers, drop it if you want them as strings from csv
for l in ("L1", "L2", "L3", "L4"):
counts[ (id, int(item[l])) ] += 1
# now, all that's left is to convert each entry in counts to the list that we want
for key,item in counts.items():
lst = list(key) + [item] # list() converts the tuple (id,Li) to [id,li] so we can append the count to that
print (repr(lst))