数据如下所示:
id,outer,inner1,inner2,inner3
123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp
我希望得到的字典是
{'123': {'Smith,John': 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'},
'456': {'Williams,Tim': 'xx', 'yy', 'zz', 'vv', 'ww', 'zz'},
{'Miller,Ray': 'rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'}}
我尝试调整Python Creating A Nested Dictionary From CSV File中接受的答案,但是这个方法会在每一行都覆盖字典,因此只有每个id的最后一行最终会出现在字典中。
答案 0 :(得分:1)
一个collections.defaultdict使用每行中的第一个元素作为外部字典键,然后使用第二个元素作为内部字典键,并将行中其余值添加到列表中作为内部字典的值键:
import csv
from collections import defaultdict
with open("in.txt" ) as f:
next(f) # skip header
d = defaultdict(lambda: defaultdict(list))
r = csv.reader(f)
for row in r:
d[row[0]][row[1]].extend(row[2:])
from pprint import pprint as pp
pp(dict(d))
输出:
{'123': {'Smith,John': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']},
'456': {'Miller,Ray': ['rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'],
'Williams,Tim': ['xx', 'yy', 'zz', 'vv', 'ww', 'uu']}}
由于您使用的是python3,我们可以使用*
在循环中解压缩以使代码更好一些:
with open("in.txt") as f:
next(f) # skip header
d = defaultdict(lambda: defaultdict(list))
r = csv.reader(f)
for k1, k2, *vals in r:
d[k1][k2].extend(vals))
答案 1 :(得分:0)
是的,因为在那个例子中,每行都是UID:
new_data_dict[row["UID"]] = item
相反,您可以使用setdefault将条目默认为列表并附加:
new_data_dict.setdefault(row["UID"], []).append(item)
答案 2 :(得分:0)
dict.setdefault
是一种获取数据结构的好方法,可以根据需要创建它们。
import csv
import pprint
data = '''123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp
'''
data = data.splitlines()
data = csv.reader(data)
result = {}
for datum in data:
outer = result.setdefault(datum[0], {})
inner = outer.setdefault(datum[1], [])
inner.extend(datum[2:])
pprint.pprint(result)