我需要为所有往年的国家[noc]建立“ total_medals”列的移动平均值-我的数据看起来像:
medal Bronze Gold Medal Silver **total_medals**
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
我想要每个国家和每年的移动平均值(即ALG:1984年平均(总奖章)= 2.0; 1992年平均(总奖章)=(2.0 + 6.0)/ 2 = 4.0; 1996年平均(总奖章)=( 2.0 + 6.0 + 7.0)/ 3 = 5.0)-移动平均值应出现在新列中(total_medals旁边)。
此外,对于每个国家/地区和年份组合,称为“绩效”的新列应为“总奖赏”除以“移动平均值”的比例
答案 0 :(得分:1)
示例数据框:
print(df)
medal Bronze Gold Medal Silver
noc year
ALG 1984 2.0 NaN NaN NaN 2.0
1992 4.0 2.0 NaN NaN 6.0
1996 2.0 1.0 NaN 4.0 7.0
ANZ 1984 2.0 15.0 NaN 2.0 19.0
1992 3.0 5.0 NaN 2.0 10.0
1996 1.0 2.0 NaN 2.0 5.0
ARG 1984 2.0 6.0 NaN 3.0 11.0
1992 5.0 3.0 NaN 24.0 32.0
1992 3.0 7.0 NaN 5.0 15.0
使用DataFrame.groupby
+ expanding
:
df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)
medal Bronze Gold Medal Silver total_medals
noc year
ALG 1984 2.0 NaN NaN NaN 2.0 2.000000
1992 4.0 2.0 NaN NaN 6.0 4.000000
1996 2.0 1.0 NaN 4.0 7.0 5.000000
ANZ 1984 2.0 15.0 NaN 2.0 19.0 19.000000
1992 3.0 5.0 NaN 2.0 10.0 14.500000
1996 1.0 2.0 NaN 2.0 5.0 11.333333
ARG 1984 2.0 6.0 NaN 3.0 11.0 11.000000
1992 5.0 3.0 NaN 24.0 32.0 21.500000
1992 3.0 7.0 NaN 5.0 15.0 19.333333
bonze落后
s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s
您可以为此创建函数...
def lagged_medals(type_of_medal):
s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
s.index=s.index.droplevel()
df[f'{type_of_medal}_lagged']=s
lagged_medals('Silver')
#print(df)