根据分组接受pandas数据帧中的顶行

时间:2016-04-20 21:12:44

标签: python pandas

与此处的问题相关:Reordering pandas dataframe based on multiple column and sum of one column

使用sort列时,如何接受此数据框中的前2个国家/地区: 在这种情况下,前两个国家将是澳大利亚和阿富汗

  Country_FAO type   mean_area        sort
5    Australia  car  12141000.0  18910501.0
4    Australia  car   6475695.0  18910501.0
6    Australia  bus    293806.0  18910501.0
0  Afghanistan  car   2029000.0   2141000.0
1  Afghanistan  car    112000.0   2141000.0
2      Algeria  bus    827000.0    829351.0
3      Algeria  bus      2351.0    829351.0

- 编辑:

我还想保留type列。在这种情况下,解决方案应如下所示:

Country_FAO type   mean_area        sort
5    Australia  car  12141000.0  18910501.0
4    Australia  car   6475695.0  18910501.0
6    Australia  bus    293806.0  18910501.0
0  Afghanistan  car   2029000.0   2141000.0
1  Afghanistan  car    112000.0   2141000.0

1 个答案:

答案 0 :(得分:1)

<强>更新

In [166]: df.loc[df.Country_FAO.isin(df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').index)]
Out[166]:
   Country_FAO type   mean_area        sort
5    Australia  car  12141000.0  18910501.0
4    Australia  car   6475695.0  18910501.0
6    Australia  bus    293806.0  18910501.0
0  Afghanistan  car   2029000.0   2141000.0
1  Afghanistan  car    112000.0   2141000.0

我会这样做:

In [153]: df.groupby('Country_FAO').sum()
Out[153]:
              mean_area
Country_FAO
Afghanistan   2141000.0
Algeria        829351.0
Australia    18910501.0

In [154]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area')
Out[154]:
              mean_area
Country_FAO
Australia    18910501.0
Afghanistan   2141000.0

In [155]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').index
Out[155]: Index(['Australia', 'Afghanistan'], dtype='object', name='Country_FAO')

另外,您可能想要重置索引:

In [156]: df.groupby('Country_FAO').sum().nlargest(2, 'mean_area').reset_index()
Out[156]:
   Country_FAO   mean_area
0    Australia  18910501.0
1  Afghanistan   2141000.0