Pandas Group由三列组成,但保留所有其他列

时间:2018-01-08 14:27:02

标签: python pandas dataframe

我有这样的数据框:

offer_id  affiliate_id affiliate_source affiliate_sub5  advertiser_id Payout_cent Revenue_cents
 428572          1327         14331605       14331605            291    50             30
 428572          1327             1465           1465            291    50             30
 428572          1327             1336           1336            291    50             30
 428572          1327         14331605       14331605            291    50             30
 428572          1327         14331605       14331605            291    50             30

我正在将三列“offer_id”,“affiliate_id”,“affiliate_source”分组

操作:

df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source'])[["payout_cents", "revenue_cents"]].sum()

结果:

offer_id affiliate_id affiliate_source                      payout_cents    revenue_cents
428572   1327         14331605                               90            150
                      1465                                   30            50
                      1336                                   30            50

问题: 我需要在小组操作后获得其他列,例如“advertiser_id”和“affiliate_sub5”。

感谢。

2 个答案:

答案 0 :(得分:1)

我认为add_suffix需要join

cols = ['offer_id', 'affiliate_id', 'affiliate_source']
df2 = df1.groupby(cols)[["payout_cents", "revenue_cents"]].sum().add_suffix('_sum')
df = df1.join(df2, on=cols)

print (df)
   offer_id  affiliate_id  affiliate_source  affiliate_sub5  advertiser_id  \
0    428572          1327          14331605        14331605            291   
1    428572          1327              1465            1465            291   
2    428572          1327              1336            1336            291   
3    428572          1327          14331605        14331605            291   
4    428572          1327          14331605        14331605            291   

   payout_cents  revenue_cents  payout_cents_sum  revenue_cents_sum  
0            50             30               150                 90  
1            50             30                50                 30  
2            50             30                50                 30  
3            50             30               150                 90  
4            50             30               150                 90  

transform的另一个解决方案:

cols = ['offer_id', 'affiliate_id', 'affiliate_source']
cols1 = ["payout_cents", "revenue_cents"]

df1[['{}_sum'.format(x) for x in cols1]] = df1.groupby(cols)[cols1].transform('sum')

答案 1 :(得分:0)

然后在groupby()

中使用它们
df1.groupby(['offer_id', 'affiliate_id', 'affiliate_source', 'advertiser_id', 'affiliate_sub5'])[["payout_cents", "revenue_cents"]].sum()