我有一个数据框,格式为:
Name Score Bin
John 90 80-100
Marc 30 20-40
John 10 0-20
David 20 0-20
...
我想创建一个如下所示的数据透视表:
Name 0-20 20-40 40-60 60-80 80-100 Total count Avg score
John 1 2 nan nan 2 5 60.53
Marc nan 2 nan nan nan 2 32.13
David 3 2 nan nan nan 5 21.80
因此,我想创建一列来显示每个存储桶的值计数以及值的总计数和平均得分。
我尝试过
table = pd.pivot_table(df, values=['Score', "Bin"], index=["nAME"],
aggfunc={"Score" : np.average, "Bin" : "count"},
dropna=True, margins = True)
但是我只是获得总体计数而未按每个存储段细分
答案 0 :(得分:0)
分三步完成任务:
生成数据透视表:
df2 = pd.pivot_table(df, index='Name', columns='Bin', values='Score', aggfunc='count')\
.reindex(columns=['0-20', '20-40', '40-60', '60-80', '80-100'])\
.rename_axis(columns='')
将您的源数据的结果扩展到大致可以得到您的预期 结果是:
0-20 20-40 40-60 60-80 80-100
Name
David 3.0 2.0 NaN NaN NaN
John 1.0 2.0 NaN NaN 2.0
Marc NaN 2.0 NaN NaN NaN
注意:由于 NaN 是 float 的特例,因此其他值也是 float 类型。
生成总数和平均得分:
df3 = df.groupby('Name')\
.agg(Total_count=('Score', 'count'), Avg_score=('Score', 'mean'))\
.rename(columns={'Total_count': 'Total count', 'Avg_score': 'Avg score'})
结果是:
Total count Avg score
Name
David 5 21.8
John 5 61.0
Marc 2 32.0
加入以上两个表:
result = df2.join(df3)
结果是:
0-20 20-40 40-60 60-80 80-100 Total count Avg score
Name
David 3.0 2.0 NaN NaN NaN 5 21.8
John 1.0 2.0 NaN NaN 2.0 5 61.0
Marc NaN 2.0 NaN NaN NaN 2 32.0