pandas:如何按平均数量对分组数据进行排序?

时间:2016-06-30 14:38:44

标签: python pandas

我有一个如下所示的数据框:

Speciality     Amount
Greek          15
Greek          16 
Italian        8
Italian        11
Italian        13

我现在已经汇总了每个专业的平均值和数量:

df_by_spec_count = df.groupby('Speciality').agg(['mean', 'count'])

现在我想打印出具有最高平均值的前10名专业。

我试过这个:

print df_by_spec_count.sort_values(by='count',ascending=False).head()

但是我得到了一个KeyError。我做错了什么?

2 个答案:

答案 0 :(得分:2)

您有分层列,因此您需要传递一个元组来选择适当的列级别进行排序:

In [324]:

df_by_spec_count.sort_values(by=('Amount','count'),ascending=False).head()
Out[324]:
               Amount      
                 mean count
Speciality                 
Italian     10.666667     3
Greek       15.500000     2

如果查看原始分组结果,您可以看到原因:

In[321]:
df_by_spec_count

Out[321]:
               Amount      
                 mean count
Speciality                 
Greek       15.500000     2
Italian     10.666667     3

In [325]:
df_by_spec_count.columns

Out[325]:
MultiIndex(levels=[['Amount'], ['mean', 'count']],
           labels=[[0, 0], [0, 1]])

答案 1 :(得分:2)

另一种解决方案是按MultiIndex.droplevel删除顶级:

df_by_spec_count = df.groupby('Speciality').agg(['mean', 'count'])
df_by_spec_count.columns = df_by_spec_count.columns.droplevel(0)
print (df_by_spec_count)

                 mean  count
Speciality                  
Greek       15.500000      2
Italian     10.666667      3

print (df_by_spec_count.sort_values(by='count',ascending=False).head())
                 mean  count
Speciality                  
Italian     10.666667      3
Greek       15.500000      2

但更好的解决方案是指定列Amount以便在groupby中汇总 - 在列中获取 Multiindex

df_by_spec_count = df.groupby('Speciality')['Amount'].agg(['mean', 'count'])
print (df_by_spec_count)
                 mean  count
Speciality                  
Greek       15.500000      2
Italian     10.666667      3

print (df_by_spec_count.sort_values(by='count',ascending=False).head())
                 mean  count
Speciality                  
Italian     10.666667      3
Greek       15.500000      2