我是熊猫新手。 我有一个Horse结果数据框,看起来像这样(只有大得多):
Horses RaceDate Position
1 RedHorse 1/2/00 2
2 BlueHorse 1/2/00 6
3 YellowHorse 1/2/00 7
4 RedHorse 15/1/00 1
5 RedHorse 1/1/00 5
对于每个条目,我想计算出那匹马第一次跑的赢率。像这样:
Horses RaceDate Position WinPercentage
1 RedHorse 1/2/00 2 50%
2 BlueHorse 1/2/00 6 0%
3 YellowHorse 1/2/00 7 0%
4 RedHorse 15/1/00 5 100%
5 RedHorse 1/1/00 1 0%
我该怎么做?
答案 0 :(得分:1)
每匹马获胜
df2 = df.copy(deep=True)
df2 = df2.reset_index()
df2 = df2.sort_values('RaceDate')
df2['win'] = np.where(df2.Position == 1, 1, 0)
df2['win_count'] = df2.groupby(['Horses'])['win'].cumsum()
df2['race_count'] = df2.groupby(['Horses'])['win_count'].cumsum()
df2['WinPercentage'] = df2['win_count'] / df2['race_count'] * 100
df2 = df2.sort_index()
print(df2)
输出:
index Horses RaceDate Position win win_count race_count WinPercentage
0 1 RedHorse 2000-02-01 2 0 1 2 50.0
1 2 BlueHorse 2000-02-01 6 0 0 0 NaN
2 3 YellowHorse 2000-02-01 7 0 0 0 NaN
3 4 RedHorse 2000-01-15 1 1 1 1 100.0
4 5 RedHorse 2000-01-01 5 0 0 0 NaN
每行获胜
df1 = df.copy(deep=True)
df1 = df1.reset_index()
df1 = df1.sort_values(['RaceDate', 'index'])
df1['win'] = np.where(df1.Position == 1, 1, 0)
df1['win'] = df1.win.ffill()
df1['win_count'] = df1.win.cumsum()
df1['race_count'] = df1.win_count.cumsum()
df1['WinPercentage'] = df1['win_count'] / df1['race_count'] * 100
print(df1)
输出:
index Horses RaceDate Position win win_count race_count WinPercentage
4 5 RedHorse 2000-01-01 5 0 0 0 NaN
3 4 RedHorse 2000-01-15 1 1 1 1 100.000000
0 1 RedHorse 2000-02-01 2 0 1 2 50.000000
1 2 BlueHorse 2000-02-01 6 0 1 3 33.333333
2 3 YellowHorse 2000-02-01 7 0 1 4 25.000000
两个数据框的串联
dfFinal = df1[['index', 'Horses', 'RaceDate', 'WinPercentage']].merge(df2[['index', 'Horses', 'RaceDate', 'WinPercentage']], on=['index', 'Horses', 'RaceDate'], how='outer')
print(dfFinal)
输出:
index Horses RaceDate WinPercentage_x WinPercentage_y
0 5 RedHorse 2000-01-01 NaN NaN
1 4 RedHorse 2000-01-15 100.000000 100.0
2 1 RedHorse 2000-02-01 50.000000 50.0
3 2 BlueHorse 2000-02-01 33.333333 NaN
4 3 YellowHorse 2000-02-01 25.000000 NaN
答案 1 :(得分:0)
如果我对您的理解正确,那么您需要按日期排序,按马分组,然后取每个日期获胜的累计均值?
df['WonRace'] = df['Position'] == 1
df.sort_values('RaceDate').groupby('Horses')['WonRace'].expanding().mean()