我有一个有趣的人!我试图找到一个重复的问题,但是没有成功...
我的数据框包含2013-2016年的所有美国和地区,并具有多个属性。
>>> df.head(2)
state enrollees utilizing enrol_age65 util_age65 year
1 Alabama 637247 635431 473376 474334 2013
2 Alaska 30486 28514 21721 20457 2013
>>> df.tail(2)
state enrollees utilizing enrol_age65 util_age65 year
214 Puerto Rico 581861 579514 453181 450150 2016
215 U.S. Territories 24329 16979 22608 15921 2016
我想按年份和州进行分组,并显示每年的前3个州(按“注册人”或“利用”-没关系)。
所需输出:
enrollees utilizing
year state
2013 California 3933310 3823455
New York 3133980 3002948
Florida 2984799 2847574
...
2016 California 4516216 4365896
Florida 4186823 3984756
New York 4009829 3874682
到目前为止,我已经尝试了以下方法:
df.groupby(['year','state'])['enrollees','utilizing'].sum().head(3)
哪个仅产生GroupBy对象的前3行:
enrollees utilizing
year state
2013 Alabama 637247 635431
Alaska 30486 28514
Arizona 707683 683273
我还尝试了lambda函数:
df.groupby(['year','state'])['enrollees','utilizing']\
.apply(lambda x: np.sum(x)).nlargest(3, 'enrollees')
哪个在GroupBy对象中产生绝对最大的3:
enrollees utilizing
year state
2016 California 4516216 4365896
2015 California 4324304 4191704
2014 California 4133532 4011208
我认为这可能与GroupBy对象的索引有关,但是我不确定...任何指导都将不胜感激!
答案 0 :(得分:1)
然后,您需要对GroupBy对象element.scrollTop = intValue;
答案 1 :(得分:1)
好吧,你可以做些不太漂亮的事情。
首先使用set()
获取唯一年份列表:
years_list = list(set(df.year))
创建一个虚拟数据框和一个用于连接我过去制作的函数:
def concatenate_loop_dfs(df_temp, df_full, axis=0):
"""
to avoid retyping the same line of code for every df.
the parameters should be the temporary df created at each loop and the concatenated DF that will contain all
values which must first be initialized (outside the loop) as df_name = pd.DataFrame(). """
if df_full.empty:
df_full = df_temp
else:
df_full = pd.concat([df_full, df_temp], axis=axis)
return df_full
创建虚拟最终df
df_final = pd.DataFrame()
现在,您将循环浏览每一年并总结为新的DF:
for year in years_list:
# The query function does a search for where
# the @year means the external variable, in this case the input from loop
# then you'll have a temporary DF with only the year and sorting and getting top3
df2 = df.query("year == @year")
df_temp = df2.groupby(['year','state'])['enrollees','utilizing'].sum().sort_values(by="enrollees", ascending=False).head(3)
# finally you'll call our function that will keep concating the tmp DFs
df_final = concatenate_loop_dfs(df_temp, df_final)
完成。
print(df_final)