我有一个数据集:
df
comment date experience approach type banana apple score
fruits are healthy banana 2010-01-19 Intermediate fitness athlete True False 0.88
i love apples 2010-01-19 Expert athlete False False True 0.10
是否可以创建如下所示的摘要表?
date fruit type average_score_perdate_per_type
2010-01-19 banana intermediate 0.88
2010-01-19 banana fitness 0.88
2010-01-19 apple Expert 0.10
2010-01-19 apple Athlete 0.10
我尝试过:
df = df.groupby(['date', experience ])['score'].transform('mean')
答案 0 :(得分:0)
我不确定您是如何在示例摘要表中创建“类型”列的,但是(根据显示的数据)假设它是列“经验”和“方法”的组合,则可以通过以下操作获得相同的摘要代码:
import pandas as pd
data = [["fruits are healthy banana","2010-01-19","Intermediate","fitness","athlete",True,False,0.88],
["i love apples","2010-01-19","Expert","athlete",False,False,True,0.10]]
df = pd.DataFrame(data, columns=["comment","date","experience","approach","type","banana","apple","score"])
fruits = ['banana', 'apple']
df['fruit'] = df[fruits].idxmax(axis=1)
df['type_2'] = df[['experience','approach']].apply(list, axis=1)
df.explode('type_2').groupby(['date','fruit','type_2']).agg({'score':'mean'}).rename(columns={'score':'avg_score'})
(可选)您可以在末尾添加.reset_index()
,以摆脱pandas.MultiIndex。