用多个类别对熊猫进行分组

时间:2020-08-19 18:39:24

标签: python python-3.x pandas dataframe

我有一个数据集:

df
       comment                date          experience       approach    type     banana     apple   score 
fruits are healthy banana   2010-01-19     Intermediate     fitness     athlete     True      False   0.88
i love apples                2010-01-19    Expert           athlete     False      False     True     0.10

是否可以创建如下所示的摘要表?

  date          fruit           type           average_score_perdate_per_type
2010-01-19      banana         intermediate        0.88
2010-01-19      banana         fitness             0.88
2010-01-19      apple          Expert               0.10
2010-01-19      apple          Athlete              0.10

我尝试过:

df = df.groupby(['date', experience ])['score'].transform('mean')

1 个答案:

答案 0 :(得分:0)

我不确定您是如何在示例摘要表中创建“类型”列的,但是(根据显示的数据)假设它是列“经验”和“方法”的组合,则可以通过以下操作获得相同的摘要代码:

import pandas as pd

data = [["fruits are healthy banana","2010-01-19","Intermediate","fitness","athlete",True,False,0.88],
        ["i love apples","2010-01-19","Expert","athlete",False,False,True,0.10]]

df = pd.DataFrame(data, columns=["comment","date","experience","approach","type","banana","apple","score"])

fruits = ['banana', 'apple']
df['fruit'] = df[fruits].idxmax(axis=1)
df['type_2'] = df[['experience','approach']].apply(list, axis=1)

df.explode('type_2').groupby(['date','fruit','type_2']).agg({'score':'mean'}).rename(columns={'score':'avg_score'})

(可选)您可以在末尾添加.reset_index(),以摆脱pandas.MultiIndex。

results