Question

我有这个df：

df = pd.DataFrame({'Players':['John', 'Will', 'John', 'Will', 'John', 'Will'],
                    'Round': [1, 1, 2, 2, 3, 3],
                    'Goals': [0, 1, 1, 1, 2, 0]})

打印：

  Players  Round  Goals
0    John      1      0
1    Will      1      1
2    John      2      1
3    Will      2      1
4    John      3      2
5    Will      3      0

现在，我如何保持相同的结构，并为每位玩家逐个计算mean()，作为新列，最后显示为：

  Players  Round  Goals   Mean
0    John      1      0   0
1    Will      1      1   1
2    John      2      1   0.5
3    Will      2      1   1
4    John      3      2   1
5    Will      3      0   0.6

Answer 1

尝试将cumsum和cumcount与groupby一起使用：

g=df.groupby(['Players'])['Goals']
df['Mean'] = g.cumsum() / (g.cumcount() + 1)

输出：

  Players  Round  Goals      Mean
0    John      1      0  0.000000
1    Will      1      1  1.000000
2    John      2      1  0.500000
3    Will      2      1  1.000000
4    John      3      2  1.000000
5    Will      3      0  0.666667

Answer 2

首先这样做：

mean = df.groupby('Players')['Goals'].expanding().mean()

它为您提供：

Players   
John     0    0.000000
         2    0.500000
         4    1.000000
Will     1    1.000000
         3    1.000000
         5    0.666667
Name: Goals, dtype: float64

我们不需要Players列作为索引的一部分，因此将其删除：

mean.index = mean.index.droplevel(0)

最后分配给原始DataFrame：

df['Mean'] = mean

最终结果是：

  Players  Round  Goals      Mean
0    John      1      0  0.000000
1    Will      1      1  1.000000
2    John      2      1  0.500000
3    Will      2      1  1.000000
4    John      3      2  1.000000
5    Will      3      0  0.666667

熊猫-计算平均值并在新列中增加价值

2 个答案: