我有一个如下所示的数据框:
Speciality Amount
Greek 15
Greek 16
Italian 8
Italian 11
Italian 13
我现在已经汇总了每个专业的平均值和数量:
df_by_spec_count = df.groupby('Speciality').agg(['mean', 'count'])
现在我想打印出具有最高平均值的前10名专业。
我试过这个:
print df_by_spec_count.sort_values(by='count',ascending=False).head()
但是我得到了一个KeyError。我做错了什么?
答案 0 :(得分:2)
您有分层列,因此您需要传递一个元组来选择适当的列级别进行排序:
In [324]:
df_by_spec_count.sort_values(by=('Amount','count'),ascending=False).head()
Out[324]:
Amount
mean count
Speciality
Italian 10.666667 3
Greek 15.500000 2
如果查看原始分组结果,您可以看到原因:
In[321]:
df_by_spec_count
Out[321]:
Amount
mean count
Speciality
Greek 15.500000 2
Italian 10.666667 3
In [325]:
df_by_spec_count.columns
Out[325]:
MultiIndex(levels=[['Amount'], ['mean', 'count']],
labels=[[0, 0], [0, 1]])
答案 1 :(得分:2)
另一种解决方案是按MultiIndex.droplevel
删除顶级:
df_by_spec_count = df.groupby('Speciality').agg(['mean', 'count'])
df_by_spec_count.columns = df_by_spec_count.columns.droplevel(0)
print (df_by_spec_count)
mean count
Speciality
Greek 15.500000 2
Italian 10.666667 3
print (df_by_spec_count.sort_values(by='count',ascending=False).head())
mean count
Speciality
Italian 10.666667 3
Greek 15.500000 2
但更好的解决方案是指定列Amount
以便在groupby
中汇总 - 在列中获取否 Multiindex
:
df_by_spec_count = df.groupby('Speciality')['Amount'].agg(['mean', 'count'])
print (df_by_spec_count)
mean count
Speciality
Greek 15.500000 2
Italian 10.666667 3
print (df_by_spec_count.sort_values(by='count',ascending=False).head())
mean count
Speciality
Italian 10.666667 3
Greek 15.500000 2