我有这个df:
df = pd.DataFrame({'Players':['John', 'Will', 'John', 'Will', 'John', 'Will'],
'Round': [1, 1, 2, 2, 3, 3],
'Goals': [0, 1, 1, 1, 2, 0]})
打印:
Players Round Goals
0 John 1 0
1 Will 1 1
2 John 2 1
3 Will 2 1
4 John 3 2
5 Will 3 0
现在,我如何保持相同的结构,并为每位玩家逐个计算mean()
,作为新列,最后显示为:
Players Round Goals Mean
0 John 1 0 0
1 Will 1 1 1
2 John 2 1 0.5
3 Will 2 1 1
4 John 3 2 1
5 Will 3 0 0.6
答案 0 :(得分:3)
尝试将cumsum
和cumcount
与groupby一起使用:
g=df.groupby(['Players'])['Goals']
df['Mean'] = g.cumsum() / (g.cumcount() + 1)
输出:
Players Round Goals Mean
0 John 1 0 0.000000
1 Will 1 1 1.000000
2 John 2 1 0.500000
3 Will 2 1 1.000000
4 John 3 2 1.000000
5 Will 3 0 0.666667
答案 1 :(得分:3)
首先这样做:
mean = df.groupby('Players')['Goals'].expanding().mean()
它为您提供:
Players
John 0 0.000000
2 0.500000
4 1.000000
Will 1 1.000000
3 1.000000
5 0.666667
Name: Goals, dtype: float64
我们不需要Players
列作为索引的一部分,因此将其删除:
mean.index = mean.index.droplevel(0)
最后分配给原始DataFrame:
df['Mean'] = mean
最终结果是:
Players Round Goals Mean
0 John 1 0 0.000000
1 Will 1 1 1.000000
2 John 2 1 0.500000
3 Will 2 1 1.000000
4 John 3 2 1.000000
5 Will 3 0 0.666667