我有一个这样的数据框:
Subject_id Subject Score
Subject_1 Math 5
Subject_1 Language 4
Subject_1 Music 8
Subject_2 Math 8
Subject_2 Language 3
Subject_2 Music 9
我想把它转换成字典,按subject_id分组
{'Subject_1': {'Math': 5,
'Language': 4,
'Music': 8},
{'Subject_2': {'Math': 8,
'Language': 3,
'Music': 9}
}
如果我只有一个主题,那么我可以这样:
my_dict['Subject_1'] = dict(zip(df['Subject'],df['Score']))
但是因为我有几个主题,所以键重复列表,所以我不能直接使用zip。
Dataframes有.to_dict('index')
方法,但我需要能够在创建字典时按特定列进行分组。
我怎么能实现这个目标?
感谢。
答案 0 :(得分:4)
将groupby
与自定义lambda函数和最后转换输出Series
to_dict
一起使用:
d = (df.groupby('Subject_id')
.apply(lambda x: dict(zip(x['Subject'],x['Score'])))
.to_dict())
print (d)
{'Subject_2': {'Math': 8, 'Music': 9, 'Language': 3},
'Subject_1': {'Math': 5, 'Music': 8, 'Language': 4}}
详情:
print (df.groupby('Subject_id').apply(lambda x: dict(zip(x['Subject'],x['Score']))))
Subject_id
Subject_1 {'Math': 5, 'Music': 8, 'Language': 4}
Subject_2 {'Math': 8, 'Music': 9, 'Language': 3}
dtype: object
答案 1 :(得分:4)
将to_dict
与pivot
In [29]: df.pivot('Subject_id', 'Subject', 'Score').to_dict('index')
Out[29]:
{'Subject_1': {'Language': 4L, 'Math': 5L, 'Music': 8L},
'Subject_2': {'Language': 3L, 'Math': 8L, 'Music': 9L}}
或者,
In [25]: df.set_index(['Subject_id', 'Subject']).unstack()['Score'].to_dict('index')
Out[25]:
{'Subject_1': {'Language': 4L, 'Math': 5L, 'Music': 8L},
'Subject_2': {'Language': 3L, 'Math': 8L, 'Music': 9L}}
答案 2 :(得分:0)
添加到零,您可以使用星号 (*) 通过 df.columns 的列表理解获得更多舒适度和/或额外过滤
import io
import pandas as pd
TESTDATA = """
Subject_id; Subject; Score
Subject_1; Math; 5
Subject_1; Language; 4
Subject_1; Music; 8
Subject_2; Math; 8
Subject_2; Language; 3
Subject_2; Music; 9
"""
df = pd.read_csv( io.StringIO(TESTDATA) , sep=";").applymap(lambda x: x.strip() if isinstance(x, str) else x)
df.pivot(*df.columns).to_dict('index')
{'Subject_1': {'Language': 4, 'Math': 5, 'Music': 8},
'Subject_2': {'Language': 3, 'Math': 8, 'Music': 9}}