我有这个数据框
STATE County POP
1 Alabama Autauga County 54571
2 Alabama Baldwin County 182265
3 Alabama Barbour County 27457
4 Alabama Bibb County 22915
5 Alabama Blount County 57322
6 Alabama Bullock County 10914
7 Alabama Butler County 20947
8 Alabama Calhoun County 118572
...
3162 Wisconsin Washburn County 15911
3163 Wisconsin Washington County 131887
3164 Wisconsin Waukesha County 389891
3165 Wisconsin Waupaca County 52410
3166 Wisconsin Waushara County 24496
3167 Wisconsin Winnebago County 166994
3168 Wisconsin Wood County 74749
....
3182 Wyoming Natrona County 75450
3183 Wyoming Niobrara County 2484
3184 Wyoming Park County 28205
3185 Wyoming Platte County 8667
3186 Wyoming Sheridan County 29116
如何按州和县对数据进行分组以显示每个州的前3个县?
STATE COUNTY POP
Alabama Baldwin County 182265
Calhoun County 18572
Blount County 57322
Wisconsin Waukesha County 389891
Winnebago County 166994
Washington County 131887
Wyoming Park County 28205
Natrona County 75450
Sheridan County 29116
我尝试过
df.sort_values('POP',ascending=False).groupby(['STATE','COUNTY']).sum().head(3)
但它只显示前3个条目,但我希望每个组的前3个条目
POP
STATE COUNTY
Alabama Autauga County 54571
Baldwin County 182265
Barbour County 27457
答案 0 :(得分:3)
使用两次groupby
,然后依次使用nlargest
和reset_index
:
(df.groupby(['STATE', 'County'])['POP'].sum()
.groupby(level=0, group_keys=False).nlargest(3).reset_index())
STATE County POP
0 Alabama Baldwin County 182265
1 Alabama Calhoun County 118572
2 Alabama Blount County 57322
3 Wisconsin Waukesha County 389891
4 Wisconsin Winnebago County 166994
5 Wisconsin Washington County 131887
6 Wyoming Natrona County 75450
7 Wyoming Sheridan County 29116
8 Wyoming Park County 28205
或者,如果您愿意,不要重置索引,输出将是:
STATE County
Alabama Baldwin County 182265
Calhoun County 118572
Blount County 57322
Wisconsin Waukesha County 389891
Winnebago County 166994
Washington County 131887
Wyoming Natrona County 75450
Sheridan County 29116
Park County 28205
答案 1 :(得分:1)
在DataFrame.sort_values
的2列中使用GroupBy.head
:
#if necessary
#df = df.groupby(['STATE','County'], as_index=False).sum()
df = df.sort_values(['STATE','POP'], ascending=[True, False]).groupby('STATE').head(3)
print (df)
STATE County POP
2 Alabama Baldwin County 182265
8 Alabama Calhoun County 118572
5 Alabama Blount County 57322
3164 Wisconsin Waukesha County 389891
3167 Wisconsin Winnebago County 166994
3163 Wisconsin Washington County 131887
3182 Wyoming Natrona County 75450
3186 Wyoming Sheridan County 29116
3184 Wyoming Park County 28205
如果需要MultiIndex
,请添加DataFrame.set_index
:
df = (df.sort_values(['STATE','POP'], ascending=[True, False])
.groupby('STATE')
.head(3)
.set_index(['STATE','County'])