Pandas Dataframe按列进行分组

时间:2018-01-11 13:34:04

标签: python pandas dictionary dataframe

我有一个这样的数据框:

Subject_id    Subject    Score    
Subject_1        Math        5                 
Subject_1    Language        4                 
Subject_1       Music        8
Subject_2        Math        8                 
Subject_2    Language        3                 
Subject_2       Music        9

我想把它转换成字典,按subject_id分组

{'Subject_1': {'Math': 5,
               'Language': 4,
               'Music': 8},
{'Subject_2': {'Math': 8,
               'Language': 3,
               'Music': 9}
}

如果我只有一个主题,那么我可以这样:

my_dict['Subject_1'] = dict(zip(df['Subject'],df['Score']))

但是因为我有几个主题,所以键重复列表,所以我不能直接使用zip。

Dataframes有.to_dict('index')方法,但我需要能够在创建字典时按特定列进行分组。

我怎么能实现这个目标?

感谢。

3 个答案:

答案 0 :(得分:4)

groupby与自定义lambda函数和最后转换输出Series to_dict一起使用:

d = (df.groupby('Subject_id')
       .apply(lambda x: dict(zip(x['Subject'],x['Score'])))
       .to_dict())

print (d)
{'Subject_2': {'Math': 8, 'Music': 9, 'Language': 3}, 
 'Subject_1': {'Math': 5, 'Music': 8, 'Language': 4}}

详情:

print (df.groupby('Subject_id').apply(lambda x: dict(zip(x['Subject'],x['Score']))))

Subject_id
Subject_1    {'Math': 5, 'Music': 8, 'Language': 4}
Subject_2    {'Math': 8, 'Music': 9, 'Language': 3}
dtype: object

答案 1 :(得分:4)

to_dictpivot

一起使用
In [29]: df.pivot('Subject_id', 'Subject', 'Score').to_dict('index')
Out[29]:
{'Subject_1': {'Language': 4L, 'Math': 5L, 'Music': 8L},
 'Subject_2': {'Language': 3L, 'Math': 8L, 'Music': 9L}}

或者,

In [25]: df.set_index(['Subject_id', 'Subject']).unstack()['Score'].to_dict('index')
Out[25]:
{'Subject_1': {'Language': 4L, 'Math': 5L, 'Music': 8L},
 'Subject_2': {'Language': 3L, 'Math': 8L, 'Music': 9L}}

答案 2 :(得分:0)

添加到零,您可以使用星号 (*) 通过 df.columns 的列表理解获得更多舒适度和/或额外过滤

import io 
import pandas as pd

TESTDATA = """
Subject_id;    Subject;    Score    
Subject_1;        Math;        5                 
Subject_1;    Language;        4                 
Subject_1;       Music;        8
Subject_2;        Math;        8                 
Subject_2;    Language;        3                 
Subject_2;       Music;        9

"""
df = pd.read_csv(  io.StringIO(TESTDATA)  , sep=";").applymap(lambda x: x.strip() if isinstance(x, str) else x)

df.pivot(*df.columns).to_dict('index')

{'Subject_1': {'Language': 4, 'Math': 5, 'Music': 8},
'Subject_2': {'Language': 3, 'Math': 8, 'Music': 9}}