我有一个NBA游戏结果的大熊猫数据框,其中每一行代表一个游戏,以及该游戏期间特定球员的状态。这是一排小的示例:
League Offensive Rebound % Player_Name ... Blocks Outcome Age
0 NBA 1.0 Alaa Abdelnaby ... 0.0 W (+8) 25-211
1 NBA 2.0 Alaa Abdelnaby ... 2.0 W (+8) 25-214
2 NBA 5.0 Alaa Abdelnaby ... 1.0 W (+5) 25-216
3 NBA 0.0 Alaa Abdelnaby ... 0.0 W (+12) 25-220
4 NBA 2.0 Alaa Abdelnaby ... 0.0 L (-35) 25-222
5 NBA 0.0 Alaa Abdelnaby ... 0.0 L (-13) 25-223
6 NBA 0.0 Alaa Abdelnaby ... 0.0 L (-19) 25-238
7 NBA 0.0 Alaa Abdelnaby ... 0.0 L (-11) 25-240
8 NBA 0.0 Alaa Abdelnaby ... 0.0 L (-9) 25-241
9 NBA 0.0 Alaa Abdelnaby ... 0.0 L (-2) 25-243
[10 rows x 31 columns]
我想添加一列,以跟踪本赛季该点的球队战绩。我想出的方法是按赛季,球队和球员对数据框进行分组,然后按日期排序并进行迭代,从而在球队获胜时将获胜总数加起来。这是代码:
# Get a group for each player each season
grouped_game_data = game_data.groupby(['Season', 'Team', 'Player_Name'])
game_data['Team Wins'] = np.nan
game_data['Team Losses'] = np.nan
# Iterate through players per season
for name, group in grouped_game_data:
# Keep track of current wins
curr_wins = 0
curr_losses = 0
# Iterate through games in a season by player
for idx, game in group.sort_values(by='Date').iterrows():
game_data.loc[idx, 'Team Wins'] = curr_wins
game_data.loc[idx, 'Team Losses'] = curr_losses
if game['Outcome'][0] == 'W':
curr_wins += 1
elif game['Outcome'][0] == 'L':
curr_losses += 1
这似乎效果很好,但是花了一段时间才运行。是否有任何方法可以对此代码进行矢量化处理,或编写一些其他更有效的方法来实现相同的结果?稍后我可能不得不执行类似的操作,并希望他们花费更少的时间。预先感谢!