熊猫-计算平均值并在新列中增加价值

时间:2020-09-30 02:22:00

标签: pandas

我有这个df:

df = pd.DataFrame({'Players':['John', 'Will', 'John', 'Will', 'John', 'Will'],
                    'Round': [1, 1, 2, 2, 3, 3],
                    'Goals': [0, 1, 1, 1, 2, 0]})

打印:

  Players  Round  Goals
0    John      1      0
1    Will      1      1
2    John      2      1
3    Will      2      1
4    John      3      2
5    Will      3      0

现在,我如何保持相同的结构,并为每位玩家逐个计算mean(),作为新列,最后显示为:

  Players  Round  Goals   Mean
0    John      1      0   0
1    Will      1      1   1
2    John      2      1   0.5
3    Will      2      1   1
4    John      3      2   1
5    Will      3      0   0.6

2 个答案:

答案 0 :(得分:3)

尝试将cumsumcumcount与groupby一起使用:

g=df.groupby(['Players'])['Goals']
df['Mean'] = g.cumsum() / (g.cumcount() + 1)

输出:

  Players  Round  Goals      Mean
0    John      1      0  0.000000
1    Will      1      1  1.000000
2    John      2      1  0.500000
3    Will      2      1  1.000000
4    John      3      2  1.000000
5    Will      3      0  0.666667

答案 1 :(得分:3)

首先这样做:

mean = df.groupby('Players')['Goals'].expanding().mean()

它为您提供:

Players   
John     0    0.000000
         2    0.500000
         4    1.000000
Will     1    1.000000
         3    1.000000
         5    0.666667
Name: Goals, dtype: float64

我们不需要Players列作为索引的一部分,因此将其删除:

mean.index = mean.index.droplevel(0)

最后分配给原始DataFrame:

df['Mean'] = mean

最终结果是:

  Players  Round  Goals      Mean
0    John      1      0  0.000000
1    Will      1      1  1.000000
2    John      2      1  0.500000
3    Will      2      1  1.000000
4    John      3      2  1.000000
5    Will      3      0  0.666667