我的数据如下:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning%
0 1 1 11/16/18 1 0 20 10 0.666667
1 1 3 11/18/18 0 1 20 11 0.645161
2 1 6 11/21/18 0 1 20 12 0.625000
3 2 4 11/19/18 1 0 16 14 0.533333
4 2 8 11/23/18 1 0 17 14 0.548387
5 2 9 11/24/18 0 1 17 15 0.531250
6 3 2 11/17/18 0 1 24 8 0.750000
7 3 5 11/20/18 1 0 25 8 0.757576
8 3 7 11/22/18 1 0 26 8 0.764706
我需要获取Winning%列,并从每个team_id(包括两端)的最新观察值中减去每一行的观察值,但仅使用最大值。
所以我想找回这样的东西:
In [16]: game_df.head(9)
Out[16]:
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.666667 --
1 1 3 11/18/18 0 1 20 11 0.645161 -0.10483
2 1 6 11/21/18 0 1 20 12 0.625000 -0.13257
3 2 4 11/19/18 1 0 16 14 0.533333 -0.21667
4 2 8 11/23/18 1 0 17 14 0.548387 -0.21632
5 2 9 11/24/18 0 1 17 15 0.531250 -0.23346
6 3 2 11/17/18 0 1 24 8 0.750000 0.00000
7 3 5 11/20/18 1 0 25 8 0.757576 0.00000
8 3 7 11/22/18 1 0 26 8 0.764706 0.00000
因此,在第9场比赛中,第11/24/18队2输了,获胜率从0.548387下降到0.531250。因此,与其他两支球队相比,它的排名还处于后面。在当时,这支队伍分别为0.625000(第1队)和0.764706(第3队)。因此,%back小组#2将是-0.233456。
最后,我需要计算每个team_id在那个时刻的顺序,即在11/24/18上,team_id的排名将是3,1,2。
谢谢
答案 0 :(得分:0)
df = df.sort_values(by='game_date') # sort by date
# add a column for each team's latest %age, fill forward NaN (but not back)
for team_id in df['team_id'].unique():
df[str(team_id) + 'win_%'] = df.loc[df.team_id == team_id, ['winning%', 'game_date']].set_index(
'game_date').reindex(df.game_date).sort_index().fillna(method='ffill').values
# fillback missing (NaN) with 0
df = df.fillna(0)
# get min difference (greatest negative) for each row
df['w%_bac'] = pd.concat([df['winning%'] - df['1win_%'], df['winning%'] - df['2win_%'], df['winning%'] -
df['3win_%']], axis=1).min(1)
# drop helper columns
df = df.drop(columns=['1win_%', '2win_%', '3win_%'])
df
team_id game_id game_date w l wins losses winning% w%_bac
0 1 1 11/16/18 1 0 20 10 0.667 0.000
6 3 2 11/17/18 0 1 24 8 0.750 0.000
1 1 3 11/18/18 0 1 20 11 0.645 -0.105
3 2 4 11/19/18 1 0 16 14 0.533 -0.217
7 3 5 11/20/18 1 0 25 8 0.758 0.000
2 1 6 11/21/18 0 1 20 12 0.625 -0.133
8 3 7 11/22/18 1 0 26 8 0.765 0.000
4 2 8 11/23/18 1 0 17 14 0.548 -0.216
5 2 9 11/24/18 0 1 17 15 0.531 -0.233