Question

我正在查看美国人口普查数据：

                                                population  
State           County    
Alabama         Jefferson County                658466  
                Mobile County                   412992  
                Madison County                  334811  
Alaska          Anchorage Municipality          291826  
                Fairbanks North Star Borough    97581  
                Matanuska-Susitna Borough       88995

最终输出应该总结每个州的人口：

State           SumOfPopulation 
Alabama         1406269                  
Alaska          478402

我对groupby的尝试产生了以下错误

df.groupby('State')['population'].agg('sum') 

KeyError: 'STNAME'

适当的方法是什么样的？

Answer 1

df.groupby('State', as_index=False)['population'].sum()

这样可以正常使用

Answer 2

您的代码在pandas 0.20.0+中工作，但更好的是省略agg并仅使用sum：

df.groupby('State', as_index=False)['population'].sum()

但对于来自MultiIndex的列的reset_index的较低版本：

df.reset_index().groupby('State', as_index=False)['population'].sum()

最简单的解决方案是使用sum：

df = df['population'].sum(level='State').reset_index()
#for seelct level by position
#df = df['population'].sum(level=0).reset_index()

print (df)
     State  population
0  Alabama     1406269
1   Alaska      478402

分组依据和SUM列

2 个答案: