请找到一个样本数据
country total_funding_usd sectors
--------------------------------------------
USA 2000000 education
USA 120000 Medical
USA 8000000 Retail
IND 290000 Retail
IND 120000 Medical
CHINA 1100000 Healthcare
CHINA 120000 Medical
AUS 1100000 Retail
AUS 8000000 Medical
AUS 700000 Healthcare
查询:-想查看获得最高收益的前2个国家 总资金(用于医疗,零售)?
我可以分组打印,但是我既不能仅打印前2个国家,也不能打印某些部门。它显示所有记录。我所尝试的一切如下。请帮助
master_frame.groupby('country')['total_funding_usd'].max().head()
预期输出:
country sectors total_funding_usd
——————————————————————-------------------
USA Medical 120000
Retail 8000000
AUS Medical 8000000
Retail 1100000
答案 0 :(得分:0)
首先使用boolean indexing
进行过滤,然后汇总sum
并按Series.nlargest
获得前2个国家/地区,然后再次用isin
进行过滤:
df2 = df[df['sectors'].isin(['Medical','Retail'])]
idx = df2.groupby('country')['total_funding_usd'].sum().nlargest(2).index
df3 = df2[df2['country'].isin(idx)]
print (df3)
country total_funding_usd sectors
1 USA 120000 Medical
2 USA 8000000 Retail
7 AUS 1100000 Retail
8 AUS 8000000 Medical
详细信息:
print (df2.groupby('country')['total_funding_usd'].sum())
country
AUS 9100000
CHINA 120000
IND 410000
USA 8120000
Name: total_funding_usd, dtype: int64
print (df2.groupby('country')['total_funding_usd'].sum().nlargest(2))
country
AUS 9100000
USA 8120000
Name: total_funding_usd, dtype: int64
print (df2.groupby('country')['total_funding_usd'].sum().nlargest(2).index)
Index(['AUS', 'USA'], dtype='object', name='country')