我的数据正确分组。
df_RFQ_by_Salesperson = df[
(df['state'].str.contains('Done'))
][['sales_person_name2',
'rfq_qty',
'rfq_qty_CAD_Equiv',
'state'
]].copy()
df_RFQ_by_Salesperson = df_RFQ_by_Salesperson.groupby('sales_person_name2').agg({'state': 'size','rfq_qty': 'sum', 'rfq_qty_CAD_Equiv': 'sum'})
df_RFQ_by_Salesperson['Percentage'] = df_RFQ_by_Salesperson.rfq_qty_CAD_Equiv / df_RFQ_by_Salesperson.rfq_qty_CAD_Equiv.sum()
df_RFQ_by_Salesperson = df_RFQ_by_Salesperson.rename(columns={'state':'Done Trades'}, level=0) # rename the column header in the groupby
display(df_RFQ_by_Salesperson.sort_values('Percentage',ascending=False))
sales_person_name2 Done Trades rfq_qty rfq_qty_CAD_Equiv Percentage
MP 11 214400000.0 3.045802e+08 0.258089
AC 22 228800000.0 2.648099e+08 0.224390
YJ 7 202500000.0 2.490527e+08 0.211038
RW 18 129000000.0 1.693008e+08 0.143459
AY 171 118366000.0 1.189635e+08 0.100805
RL 47 78617000.0 7.342725e+07 0.062219
但是当我尝试使用sns.countplot进行可视化时,看起来按列分组不在列列表中,因此会引发错误。
display(df_RFQ_by_Salesperson.columns)
Index(['Done Trades', 'rfq_qty', 'rfq_qty_CAD_Equiv', 'Percentage'], dtype='object')
# # Visualisation
ax = sns.countplot(
x='sales_person_name2',
data=df_RFQ_by_Salesperson,
# Order by the count
order = df_RFQ_by_Salesperson['sales_person_name2'].value_counts().index,
color=plot_colour
)
for label in ax.xaxis.get_ticklabels():
label.set_rotation(90)
plt.show()
KeyError: 'sales_person_name2'
---> 22 order = df_RFQ_by_Salesperson['sales_person_name2'].value_counts().index,
有没有办法强制python在datarame中包含sales_person_name2?