这是名为人口普查的数据框:
SUMLEV REGION COUNTY STNAME CTYNAME CENSUS2010POP ESTIMATESBASE2010
0 50 3 1 Alabama Autauga County 54571 54571
1 50 3 3 Alabama Baldwin County 182265 182265
2 50 3 5 Alabama Barbour County 27457 27457
3 50 4 3 Arizona Cochise County 131346 131357
4 50 4 5 Arizona Coconino County 134421 134437
5 50 4 7 Arizona Gila County 53597 53597
6 50 4 21 California Glenn County 28122 28122
7 50 4 23 California Humboldt County 134623 134623
8 50 4 25 California Imperial County 174528 17452
我想为每个州('STNAME')计算'CENSUS2010POP'的总和和平均值,并将其显示为数据框。
这是我的代码,
census.set_index('STNAME')
census.groupby(level=0).CENSUS2010POP.agg({'avg': np.mean, 'sum': np.sum}).head()
但是它给出了错误:不支持嵌套重命名器
我也尝试过
census.groupby('STNAME').CENSUS2010POP.agg({'avg':np.mean, 'sum':np.sum})
它给出与上述相同的错误。
答案 0 :(得分:1)
因为只能处理一列,所以可能会通过tuple
:
df = census.groupby('STNAME').CENSUS2010POP.agg([('avg', np.mean), ('sum', np.sum)]).head()
print (df)
avg sum
STNAME
Alabama 88097.666667 264293
Arizona 106454.666667 319364
California 112424.333333 337273
或命名的聚合:
census.groupby('STNAME').agg(avg = ('CENSUS2010POP', np.mean),
sum= ('CENSUS2010POP', np.sum)).head()