我正在尝试计算时间序列的滚动汇总率。
考虑数据的方式是,它是针对不同团队的一系列多游戏系列的结果。直到上一场比赛,我们才知道谁赢了这个系列赛。我试图计算胜利率,因为它会对抗每一支对手的球队。
series_id date opposing_team won_series
1 1/1/2000 a 0
1 1/3/2000 a 0
1 1/5/2000 a 1
2 1/4/2000 a 0
2 1/7/2000 a 0
2 1/9/2000 a 0
3 1/6/2000 b 0
变为:
series_id date opposing_team won_series percent_win_against_team
1 1/1/2000 a 0 NA
1 1/3/2000 a 0 NA
1 1/5/2000 a 1 100
2 1/4/2000 a 0 NA
2 1/7/2000 a 0 100
2 1/9/2000 a 0 50
3 1/6/2000 b 0 0
答案 0 :(得分:1)
我仍然不觉得我理解你如何决定一个系列结束的规则。 3点结束了吗?为什么NA,我会想到1/3。不过,这里有一种方法可以跟踪完成的系列赛的数量和(a)赢率。
定义26472215table.csv:
series_id,date,opposing_team,won_series
1,1/1/2000,a,0
1,1/3/2000,a,0
1,1/5/2000,a,1
2,1/4/2000,a,0
2,1/7/2000,a,0
2,1/9/2000,a,0
3,1/6/2000,b,0
代码:
import pandas as pd
import numpy as np
df = pd.read_csv('26472215table.csv')
grp2 = df.groupby(['series_id'])
sr = grp2['date'].max()
sr.name = 'LastGame'
df2 = df.join( sr, on=['series_id'], how='left')
df2.sort('date')
df2['series_comp'] = df2['date'] == df2['LastGame']
df2['running_sr_cnt'] = df2.groupby(['opposing_team'])['series_comp'].cumsum()
df2['running_win_cnt'] = df2.groupby(['opposing_team'])['won_series'].cumsum()
winrate = lambda x: x[1]/ x[0] if (x[0] > 0) else None
df2['winrate'] = df2[['running_sr_cnt','running_win_cnt']].apply(winrate, axis = 1 )
结果df2 [[' date',' winrate']]:
date winrate
0 1/1/2000 NaN
1 1/3/2000 NaN
2 1/5/2000 1.0
3 1/4/2000 1.0
4 1/7/2000 1.0
5 1/9/2000 0.5
6 1/6/2000 0.0