Python 3:嵌套字典,包含来自csv的多个键

时间:2015-08-05 21:47:48

标签: python csv dictionary

数据如下所示:

id,outer,inner1,inner2,inner3
123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp

我希望得到的字典是

{'123': {'Smith,John': 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'},
 '456': {'Williams,Tim': 'xx', 'yy', 'zz', 'vv', 'ww', 'zz'},
        {'Miller,Ray': 'rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'}}

我尝试调整Python Creating A Nested Dictionary From CSV File中接受的答案,但是这个方法会在每一行都覆盖字典,因此只有每个id的最后一行最终会出现在字典中。

3 个答案:

答案 0 :(得分:1)

一个collections.defaultdict使用每行中的第一个元素作为外部字典键,然后使用第二个元素作为内部字典键,并将行中其余值添加到列表中作为内部字典的值键:

import csv
from collections import defaultdict
with open("in.txt" ) as f:
    next(f) # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for row in r:
        d[row[0]][row[1]].extend(row[2:])

from pprint import pprint as pp

pp(dict(d))

输出:

{'123': {'Smith,John': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']},
 '456': {'Miller,Ray': ['rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'],
         'Williams,Tim': ['xx', 'yy', 'zz', 'vv', 'ww', 'uu']}}

由于您使用的是python3,我们可以使用*在循环中解压缩以使代码更好一些:

with open("in.txt") as f:
    next(f)  # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for k1, k2, *vals in r:
        d[k1][k2].extend(vals))

答案 1 :(得分:0)

是的,因为在那个例子中,每行都是UID:

new_data_dict[row["UID"]] = item

相反,您可以使用setdefault将条目默认为列表并附加:

new_data_dict.setdefault(row["UID"], []).append(item)

答案 2 :(得分:0)

dict.setdefault是一种获取数据结构的好方法,可以根据需要创建它们。

import csv
import pprint

data = '''123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp
'''
data = data.splitlines()
data = csv.reader(data)

result = {}
for datum in data:
    outer = result.setdefault(datum[0], {})
    inner = outer.setdefault(datum[1], [])
    inner.extend(datum[2:])

pprint.pprint(result)