按列/年的移动平均值-python,熊猫

时间:2019-10-30 19:31:36

标签: python pandas dataframe moving-average

我需要为所有往年的国家[noc]建立“ total_medals”列的移动平均值-我的数据看起来像:

 medal     Bronze  Gold  Medal  Silver  **total_medals**
    noc year                                           
    ALG 1984     2.0   NaN    NaN     NaN           2.0
        1992     4.0   2.0    NaN     NaN           6.0
        1996     2.0   1.0            4.0           7.0
    ANZ 1984     2.0  15.0    NaN     2.0          19.0
        1992     3.0   5.0    NaN     2.0          10.0
        1996     1.0   2.0            2.0           5.0
    ARG 1984     2.0   6.0    NaN     3.0          11.0
        1992     5.0   3.0    NaN    24.0          32.0
        1992     3.0   7.0    NaN     5.0          15.0

我想要每个国家和每年的移动平均值(即ALG:1984年平均(总奖章)= 2.0; 1992年平均(总奖章)=(2.0 + 6.0)/ 2 = 4.0; 1996年平均(总奖章)=( 2.0 + 6.0 + 7.0)/ 3 = 5.0)-移动平均值应出现在新列中(total_medals旁边)。

此外,对于每个国家/地区和年份组合,称为“绩效”的新列应为“总奖赏”除以“移动平均值”的比例

1 个答案:

答案 0 :(得分:1)

示例数据框

print(df)

          medal  Bronze  Gold  Medal  Silver 
noc year                                     
ALG 1984    2.0     NaN   NaN    NaN     2.0 
    1992    4.0     2.0   NaN    NaN     6.0 
    1996    2.0     1.0   NaN    4.0     7.0 
ANZ 1984    2.0    15.0   NaN    2.0    19.0 
    1992    3.0     5.0   NaN    2.0    10.0 
    1996    1.0     2.0   NaN    2.0     5.0 
ARG 1984    2.0     6.0   NaN    3.0    11.0 
    1992    5.0     3.0   NaN   24.0    32.0 
    1992    3.0     7.0   NaN    5.0    15.0 

使用DataFrame.groupby + expanding

df['total_mean']=df.groupby(level=0,sort=False).Silver.apply(lambda x: x.expanding(1).mean())
print(df)

          medal  Bronze  Gold  Medal  Silver  total_medals 
noc year                                                 
ALG 1984    2.0     NaN   NaN    NaN     2.0    2.000000 
    1992    4.0     2.0   NaN    NaN     6.0    4.000000 
    1996    2.0     1.0   NaN    4.0     7.0    5.000000 
ANZ 1984    2.0    15.0   NaN    2.0    19.0   19.000000 
    1992    3.0     5.0   NaN    2.0    10.0   14.500000 
    1996    1.0     2.0   NaN    2.0     5.0   11.333333 
ARG 1984    2.0     6.0   NaN    3.0    11.0   11.000000 
    1992    5.0     3.0   NaN   24.0    32.0   21.500000 
    1992    3.0     7.0   NaN    5.0    15.0   19.333333 

bonze落后

s=df.groupby('noc').apply(lambda x: x['Bronze']/x['total_medals'].shift())
s.index=s.index.droplevel()
df['bronze_lagged']=s

您可以为此创建函数...

def lagged_medals(type_of_medal):
    s=df.groupby('noc').apply(lambda x: x[type_of_medal]/x['total_medals'].shift())
    s.index=s.index.droplevel()
    df[f'{type_of_medal}_lagged']=s

lagged_medals('Silver')
#print(df)