如何使用熊猫执行分组方式并显示特定列值的数据

时间:2019-01-25 08:05:24

标签: pandas pandas-groupby

请找到一个样本数据

country  total_funding_usd      sectors
--------------------------------------------
USA         2000000             education
USA         120000              Medical
USA         8000000             Retail
IND         290000              Retail
IND         120000              Medical
CHINA       1100000             Healthcare
CHINA       120000              Medical
AUS         1100000             Retail
AUS         8000000             Medical
AUS         700000              Healthcare

查询:-想查看获得最高收益的前2个国家 总资金(用于医疗,零售)?

我可以分组打印,但是我既不能仅打印前2个国家,也不能打印某些部门。它显示所有记录。我所尝试的一切如下。请帮助

master_frame.groupby('country')['total_funding_usd'].max().head()

预期输出:

country       sectors   total_funding_usd 
——————————————————————-------------------

USA     Medical        120000   
        Retail         8000000  

AUS     Medical        8000000
        Retail         1100000

1 个答案:

答案 0 :(得分:0)

首先使用boolean indexing进行过滤,然后汇总sum并按Series.nlargest获得前2个国家/地区,然后再次用isin进行过滤:

df2 = df[df['sectors'].isin(['Medical','Retail'])]

idx = df2.groupby('country')['total_funding_usd'].sum().nlargest(2).index

df3 = df2[df2['country'].isin(idx)]
print (df3)
  country  total_funding_usd  sectors
1     USA             120000  Medical
2     USA            8000000   Retail
7     AUS            1100000   Retail
8     AUS            8000000  Medical

详细信息

print (df2.groupby('country')['total_funding_usd'].sum())
country
AUS      9100000
CHINA     120000
IND       410000
USA      8120000
Name: total_funding_usd, dtype: int64

print (df2.groupby('country')['total_funding_usd'].sum().nlargest(2))
country
AUS    9100000
USA    8120000
Name: total_funding_usd, dtype: int64

print (df2.groupby('country')['total_funding_usd'].sum().nlargest(2).index)
Index(['AUS', 'USA'], dtype='object', name='country')