我有这样的数据框:
offer_id affiliate_id affiliate_source affiliate_sub5 advertiser_id Payout_cent Revenue_cents
428572 1327 14331605 14331605 291 50 30
428572 1327 1465 1465 291 50 30
428572 1327 1336 1336 291 50 30
428572 1327 14331605 14331605 291 50 30
428572 1327 14331605 14331605 291 50 30
我正在将三列“offer_id”,“affiliate_id”,“affiliate_source”分组
操作:
df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source'])[["payout_cents", "revenue_cents"]].sum()
结果:
offer_id affiliate_id affiliate_source payout_cents revenue_cents
428572 1327 14331605 90 150
1465 30 50
1336 30 50
问题: 我需要在小组操作后获得其他列,例如“advertiser_id”和“affiliate_sub5”。
感谢。
答案 0 :(得分:1)
我认为add_suffix
需要join
:
cols = ['offer_id', 'affiliate_id', 'affiliate_source']
df2 = df1.groupby(cols)[["payout_cents", "revenue_cents"]].sum().add_suffix('_sum')
df = df1.join(df2, on=cols)
print (df)
offer_id affiliate_id affiliate_source affiliate_sub5 advertiser_id \
0 428572 1327 14331605 14331605 291
1 428572 1327 1465 1465 291
2 428572 1327 1336 1336 291
3 428572 1327 14331605 14331605 291
4 428572 1327 14331605 14331605 291
payout_cents revenue_cents payout_cents_sum revenue_cents_sum
0 50 30 150 90
1 50 30 50 30
2 50 30 50 30
3 50 30 150 90
4 50 30 150 90
transform
的另一个解决方案:
cols = ['offer_id', 'affiliate_id', 'affiliate_source']
cols1 = ["payout_cents", "revenue_cents"]
df1[['{}_sum'.format(x) for x in cols1]] = df1.groupby(cols)[cols1].transform('sum')
答案 1 :(得分:0)
然后在groupby()
:
df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source', 'advertiser_id', 'affiliate_sub5'])[["payout_cents", "revenue_cents"]].sum()