按一列分组但另外两列并计算第三列

时间:2018-05-29 03:55:08

标签: python pandas dataframe pandas-groupby

我的df

df_RFQ_by_Salesperson = df[
                          (df['state'].str.contains('Done'))
                          ][['sales_person_name2',
                             'rfq_qty',
                             'rfq_qty_CAD_Equiv',
                             'state'
                            ]].copy()

display(df_RFQ_by_Salesperson.head(3))

    sales_person_name2  rfq_qty     rfq_qty_CAD_Equiv   state
14  AY                 200000.0     2.568713e+05        Done
22  AY                 1000000.0    1.284357e+06        Done
28  YJJ               25000000.0    4.420085e+07        Done

我想groupby上的df_RFQ_by_Salespersonsum上的rfq_qtysum上的rfq_qty_CAD_Equivcount上的state { {1}}然后根据rfq_qty_CAD_Equiv添加百分比列。我已经计算出总和和百分比列,但我不确定如何循环计数状态?

df_RFQ_by_Salesperson = df_RFQ_by_Salesperson.rename(columns={'state':'Done Trades'}, level=0) # rename the column header in the groupby
df_RFQ_by_Salesperson = df_RFQ_by_Salesperson.groupby(['sales_person_name2'])['rfq_qty','rfq_qty_CAD_Equiv'].sum() 
Total_Done_Volume = df_RFQ_by_Salesperson['rfq_qty_CAD_Equiv'].sum()
df_RFQ_by_Salesperson['Percentage'] = df_RFQ_by_Salesperson['rfq_qty_CAD_Equiv'].div(Total_Done_Volume)

display(df_RFQ_by_Salesperson.sort_values('Percentage',ascending=False))

sales_person_name2  rfq_qty     rfq_qty_CAD_Equiv   Percentage  Count of State      
MP                  214400000.0 3.045802e+08        0.258089        ?
AC                  228800000.0 2.648099e+08        0.224390        ?
YJJ                 202500000.0 2.490527e+08        0.211038        ?
RW                  129000000.0 1.693008e+08        0.143459        ?
AY                  118366000.0 1.189635e+08        0.100805        ?
RL                  78617000.0  7.342725e+07        0.062219        ?

是否可以与一组中的总和一起进行计数?

1 个答案:

答案 0 :(得分:1)

您可以通过指定从列名到功能的映射来聚合具有不同功能的多个列:

out = df.groupby('sales_person_name2').agg(
 {'rfq_qty': 'sum', 'rfq_qty_CAD_Equiv': 'sum', 'state': 'size'}
)

然后单独计算百分比并分配到百分比列

out['percentage'] = out.rfq_qty_CAD_Equiv / out.rfq_qty_CAD_Equiv.sum()