Question

我有一个包含2列的数据框，如下所示：

Index Year        Country
0     2015        US
1     2015        US
2     2015        UK
3     2015        Indonesia
4     2015        US
5     2016        India
6     2016        India
7     2016        UK

我想创建一个包含每年最大国家/地区数量的新数据框。新数据框将包含3列，如下所示：

    Index      Year      Country     Count
    0          2015      US          3
    1          2016      India       2

pandas中有没有可以快速完成的功能？

Answer 1

使用：

<强> 1

首先按groupby和size计算每对Year和Country的数量。然后按idxmax获取最大值索引，并按loc选择行：

df = df.groupby(['Year','Country']).size()
df = df.loc[df.groupby(level=0).idxmax()].reset_index(name='Count')
print (df)
   Year Country  Count
0  2015      US      3
1  2016   India      2

<强> 2

使用自定义功能value_counts和head：

df = df.groupby('Year')['Country']
       .apply(lambda x: x.value_counts().head(1))
       .rename_axis(('Year','Country'))
       .reset_index(name='Count')

print (df)
   Year Country  Count
0  2015      US      3
1  2016   India      2

在Python中过滤Dataframe

1 个答案: