我有一个包含2列地址和ID的数据框。我想将ID与字典中的相同地址合并
import pandas as pd, numpy as np
df = pd.DataFrame({'Address' : ['12 A', '66 C', '10 B', '10 B', '12 A', '12 A'],
'ID' : ['Aa', 'Bb', 'Cc', 'Dd', 'Ee', 'Ff']})
AS=df.set_index('Address')['ID'].to_dict()
print df
Address ID
0 12 A Aa
1 66 C Bb
2 10 B Cc
3 10 B Dd
4 12 A Ee
5 12 A Ff
print AS
{'66 C': 'Bb', '12 A': 'Ff', '10 B': 'Dd'}
我想要的是重复项存储多个值,如:
{'66 C': ['Bb'], '12 A': ['Aa','Ee','Ff'], '10 B': ['Cc','Dd']}
答案 0 :(得分:17)
我认为你可以在这里使用groupby
和词典理解:
>>> df
Address ID
0 12 A Aa
1 66 C Bb
2 10 B Cc
3 10 B Dd
4 12 A Ee
5 12 A Ff
>>> {k: list(v) for k,v in df.groupby("Address")["ID"]}
{'66 C': ['Bb'], '12 A': ['Aa', 'Ee', 'Ff'], '10 B': ['Cc', 'Dd']}
答案 1 :(得分:1)
回应有关多列的评论:
>>> df
Address ID Name
0 12 A Aa Alpha
1 66 C Bb Bravo
2 10 B Cc Charlie
3 10 B Dd Delta
4 12 A Ee Edgar
5 12 A Ff Frank
>>> {k: v.to_dict() for k,v in df.groupby("Address")}