大熊猫计算滚动总费率

时间:2014-10-20 18:13:02

标签: python pandas

我正在尝试计算时间序列的滚动汇总率。

考虑数据的方式是,它是针对不同团队的一系列多游戏系列的结果。直到上一场比赛,我们才知道谁赢了这个系列赛。我试图计算胜利率,因为它会对抗每一支对手的球队。

series_id     date      opposing_team   won_series
 1          1/1/2000         a            0
 1          1/3/2000         a            0
 1          1/5/2000         a            1
 2          1/4/2000         a            0
 2          1/7/2000         a            0
 2          1/9/2000         a            0
 3          1/6/2000         b            0

变为:

series_id     date      opposing_team   won_series    percent_win_against_team
 1          1/1/2000         a            0                    NA
 1          1/3/2000         a            0                    NA
 1          1/5/2000         a            1                    100
 2          1/4/2000         a            0                    NA
 2          1/7/2000         a            0                    100
 2          1/9/2000         a            0                    50
 3          1/6/2000         b            0                    0

1 个答案:

答案 0 :(得分:1)

我仍然不觉得我理解你如何决定一个系列结束的规则。 3点结束了吗?为什么NA​​,我会想到1/3。不过,这里有一种方法可以跟踪完成的系列赛的数量和(a)赢率。

定义26472215table.csv:

series_id,date,opposing_team,won_series
1,1/1/2000,a,0
1,1/3/2000,a,0
1,1/5/2000,a,1
2,1/4/2000,a,0
2,1/7/2000,a,0
2,1/9/2000,a,0
3,1/6/2000,b,0

代码:

import pandas as pd
import numpy as np

df  = pd.read_csv('26472215table.csv') 
grp2 = df.groupby(['series_id'])
sr = grp2['date'].max()
sr.name = 'LastGame'
df2 = df.join( sr, on=['series_id'], how='left')
df2.sort('date')



df2['series_comp'] = df2['date'] == df2['LastGame']
df2['running_sr_cnt']  = df2.groupby(['opposing_team'])['series_comp'].cumsum()
df2['running_win_cnt'] = df2.groupby(['opposing_team'])['won_series'].cumsum() 

winrate = lambda x: x[1]/ x[0] if (x[0] > 0) else None
df2['winrate'] = df2[['running_sr_cnt','running_win_cnt']].apply(winrate, axis = 1 )

结果df2 [[' date',' winrate']]:

       date  winrate
0  1/1/2000      NaN
1  1/3/2000      NaN
2  1/5/2000      1.0
3  1/4/2000      1.0
4  1/7/2000      1.0
5  1/9/2000      0.5
6  1/6/2000      0.0