我的文件格式如下
Pid,Lid
2000,150
2000,450
2000,300
2000,150
3000,100
3000,250
3000,100
所需的输出
{'2000':{'150':2,'300':1,'450':1},'3000':{'100':2,'250':1}}
对于每个Pid,我正在构建一个以Pid为键,而嵌套字典为值的字典。此嵌套字典以Lid为键,其频率为值。
frequency={}
for eachline in file:
eachline =eachline .strip()
Pid,Lid = eachline .split(',')
if Pid in frequency:
frequency[Pid][Lid]=frequency[Pid][Lid]+1
else:
frequency[Pid]={Lid :1}
print frequency
这是我正在尝试的代码,但是它不起作用,请帮助
答案 0 :(得分:1)
您可以使用嵌套的collections.defaultdict()
存储计数,并与csv.reader()
一起读取 .csv 文件:
from csv import reader
from collections import defaultdict
from pprint import pprint
# create nested defaultdicts
d = defaultdict(lambda: defaultdict(dict))
# open file with context manager
with open('pids.csv') as f:
# create csv reader object
csv_reader = reader(f)
# skip headers
next(csv_reader)
# collect counts
for pid, lid in csv_reader:
d[pid][lid] = d[pid].get(lid, 0) + 1
pprint(d)
其中给出以下内容:
defaultdict(<function <lambda> at 0x7fcf5b8a7f28>,
{'2000': defaultdict(<class 'dict'>,
{'150': 2,
'300': 1,
'450': 1}),
'3000': defaultdict(<class 'dict'>, {'100': 2, '250': 1})})
您还可以在子词典中使用collections.Counter()
进行计数:
from csv import reader
from collections import defaultdict
from collections import Counter
from pprint import pprint
# create defaultdict of Counters
d = defaultdict(lambda: Counter())
# open file with context manager
with open('pids.csv') as f:
# create csv reader object
csv_reader = reader(f)
# skip headers
next(csv_reader)
# collect counts
for pid, lid in csv_reader:
d[pid][lid] += 1
pprint(d)
其中给出以下内容:
defaultdict(<function <lambda> at 0x7f2b024b7f28>,
{'2000': Counter({'150': 2, '450': 1, '300': 1}),
'3000': Counter({'100': 2, '250': 1})})
注意::defaultdict()
和Counter()
只是dict
的子类,这意味着它们可以被视为普通词典。