Python程序以嵌套字典计数的形式获取输出

时间:2018-08-04 06:50:25

标签: python dictionary

我的文件格式如下

Pid,Lid
2000,150
2000,450
2000,300
2000,150
3000,100
3000,250
3000,100

所需的输出

{'2000':{'150':2,'300':1,'450':1},'3000':{'100':2,'250':1}}

对于每个Pid,我正在构建一个以Pid为键,而嵌套字典为值的字典。此嵌套字典以Lid为键,其频率为值。

frequency={}
for eachline in file:
    eachline =eachline .strip()
    Pid,Lid = eachline .split(',')
    if Pid in frequency:
        frequency[Pid][Lid]=frequency[Pid][Lid]+1
    else:
        frequency[Pid]={Lid :1}
print frequency

这是我正在尝试的代码,但是它不起作用,请帮助

1 个答案:

答案 0 :(得分:1)

您可以使用嵌套的collections.defaultdict()存储计数,并与csv.reader()一起读取 .csv 文件:

from csv import reader
from collections import defaultdict
from pprint import pprint

# create nested defaultdicts
d = defaultdict(lambda: defaultdict(dict))

# open file with context manager
with open('pids.csv') as f:

    # create csv reader object
    csv_reader = reader(f)

    # skip headers
    next(csv_reader)

    # collect counts
    for pid, lid in csv_reader:
        d[pid][lid] = d[pid].get(lid, 0) + 1

pprint(d)

其中给出以下内容:

defaultdict(<function <lambda> at 0x7fcf5b8a7f28>,
            {'2000': defaultdict(<class 'dict'>,
                             {'150': 2,
                              '300': 1,
                              '450': 1}),
             '3000': defaultdict(<class 'dict'>, {'100': 2, '250': 1})})

您还可以在子词典中使用collections.Counter()进行计数:

from csv import reader
from collections import defaultdict
from collections import Counter
from pprint import pprint

# create defaultdict of Counters
d = defaultdict(lambda: Counter())

# open file with context manager
with open('pids.csv') as f:

    # create csv reader object
    csv_reader = reader(f)

    # skip headers
    next(csv_reader)

    # collect counts
    for pid, lid in csv_reader:
        d[pid][lid] += 1

pprint(d)

其中给出以下内容:

defaultdict(<function <lambda> at 0x7f2b024b7f28>,
            {'2000': Counter({'150': 2, '450': 1, '300': 1}),
             '3000': Counter({'100': 2, '250': 1})})

注意:defaultdict()Counter()只是dict的子类,这意味着它们可以被视为普通词典。