Pandas:将列设置为等于另一列的分组总和

时间:2016-08-10 09:45:18

标签: python pandas

我有一个pandas数据帧如下:

   variable        Total_Beneficiaries  gender_text age_group
0   MC_OLD_18>F>5       1534    Female  5 - 18
1   MC_OLD_18>M>5       1651    Male    5 - 18
2   MC_OLD_F>0<=1       192     Female  0 - 1
3   MC_OLD_F>18         2079    Female  18 - 60
4   MC_OLD_F>=60        171     Female  Over 60
5   MC_OLD_F_Child<5    593     Female  1 - 5
6   MC_OLD_M>0<=1       264     Male    0 - 1
7   MC_OLD_M>18         1892    Male    18 - 60
8   MC_OLD_M>=60        106     Male    Over 60
9   MC_OLD_M_Child<5    602     Male    1 - 5

我想添加一列age_group_totals,该列将是Total_Beneficiariesage group之间的总和。因此对于前两行,值为3185

到目前为止,我一直在创建一个带有总和的新数据框并重新合并到原文中,如下所示:

total_by_age = izmir_agg[['age_group','Total_Beneficiaries']].groupby('age_group').agg({'Total_Beneficiaries':np.sum}).reset_index().rename(columns={'Total_Beneficiaries':'age_group_totals'})
izmir_agg = izmir_agg.merge(total_by_age,how='left',on='age_group')

这看起来很笨重,我想知道是否有办法更直接地添加此列而不创建单独的数据帧。我试过这个:

izmir_agg['age_group_totals'] = izmir_agg.groupby('age_group')['Total_Beneficiaries'].sum().tolist()

但它不起作用,因为它返回错误长度的列表。有关如何一步完成此任务的任何提示?

1 个答案:

答案 0 :(得分:2)

尝试:

izmir_agg['agg_totals'] = izmir_agg.groupby('age_group').Total_Beneficiaries.transform('sum')
izmir_agg

enter image description here