熊猫数据透视表值计数

时间:2020-01-15 15:05:04

标签: pandas pivot pivot-table

我有一个数据框,格式为:

 Name     Score    Bin
 John     90       80-100
 Marc     30       20-40
 John     10       0-20
 David    20       0-20

...

我想创建一个如下所示的数据透视表:

Name    0-20    20-40    40-60    60-80    80-100   Total count   Avg score
John     1       2        nan      nan      2            5         60.53
Marc    nan      2        nan      nan     nan           2         32.13
David   3        2        nan      nan     nan           5         21.80

因此,我想创建一列来显示每个存储桶的值计数以及值的总计数和平均得分。

我尝试过

table = pd.pivot_table(df, values=['Score', "Bin"], index=["nAME"],
                   aggfunc={"Score" : np.average, "Bin" : "count"},
                    dropna=True, margins = True)

但是我只是获得总体计数而未按每个存储段细分

1 个答案:

答案 0 :(得分:0)

分三步完成任务:

  1. 生成数据透视表:

    df2 = pd.pivot_table(df, index='Name', columns='Bin', values='Score', aggfunc='count')\
        .reindex(columns=['0-20', '20-40', '40-60', '60-80', '80-100'])\
        .rename_axis(columns='')
    

    将您的源数据的结果扩展到大致可以得到您的预期 结果是:

           0-20  20-40  40-60  60-80  80-100
    Name                                    
    David   3.0    2.0    NaN    NaN     NaN
    John    1.0    2.0    NaN    NaN     2.0
    Marc    NaN    2.0    NaN    NaN     NaN
    

    注意:由于 NaN float 的特例,因此其他值也是 float 类型。

  2. 生成总数平均得分

    df3 = df.groupby('Name')\
        .agg(Total_count=('Score', 'count'), Avg_score=('Score', 'mean'))\
        .rename(columns={'Total_count': 'Total count', 'Avg_score': 'Avg score'})
    

    结果是:

           Total count  Avg score
    Name                         
    David            5       21.8
    John             5       61.0
    Marc             2       32.0
    
  3. 加入以上两个表:

    result = df2.join(df3)
    

    结果是:

           0-20  20-40  40-60  60-80  80-100  Total count  Avg score
    Name                                                            
    David   3.0    2.0    NaN    NaN     NaN            5       21.8
    John    1.0    2.0    NaN    NaN     2.0            5       61.0
    Marc    NaN    2.0    NaN    NaN     NaN            2       32.0