我的数据框样本df_Done_Avg_Salesperson_Volume
state currency_str sales_person_name2 rfq_qty rfq_qty_CAD_Equiv
Done USD AY 200000.0 155720.0
Done USD AY 1000000.0 778600.0
Done USD AY 200000.0 155720.0
Done GBP YJJ 25000000.0 14140000.0
Done GBP YJJ 2500000.0 1946500.0
我想按sales_person_name2和currency_str进行分组,但显示rfq_qty和rfq_qty_CAD_Equiv的平均值。
sales_person_name2 currency_str` Avg rfq_qty Avg rfq_qty_CAD_Equiv
AY USD 466666.6667 363346.6667
YJJ GBP 13750000 8043250
当我尝试将两者合并时,我得到一个元组错误
d = {
('rfq_qty',np.mean)
('rfq_qty_CAD_Equiv',np.mean)
}
display(df_Done_Avg_Salesperson_Volume.groupby(['sales_person_name2','currency_str'])['rfq_qty','rfq_qty_CAD_Equiv'].agg(d).reset_index())
TypeError: 'tuple' object is not callable
有没有办法对一系列列进行分组,但是在另外两列上显示统计汇总?
答案 0 :(得分:1)
您可以简单地使用:
df.groupby(['sales_person_name2','currency_str'], as_index=False)['rfq_qty','rfq_qty_CAD_Equiv'].mean()
输出:
sales_person_name2 currency_str rfq_qty rfq_qty_CAD_Equiv
0 AY USD 466666.6666666667 363346.6666666667
1 YJJ GBP 13750000.0 8043250.0
如果必须使用.agg()
,则需要字典。
d = {
'rfq_qty':np.mean,
'rfq_qty_CAD_Equiv':np.mean
}
df.groupby(['sales_person_name2','currency_str'], as_index=False)['rfq_qty','rfq_qty_CAD_Equiv'].agg(d))