我下面正在使用的数据框:
这些是国际象棋游戏,我试图按游戏进行分组,然后根据该游戏中的下棋次数在每个游戏上执行功能...
game_id move_number colour avg_centi
0 03gDhPWr 1 white NaN
1 03gDhPWr 2 black 37.0
2 03gDhPWr 3 white 61.0
3 03gDhPWr 4 black -5.0
4 03gDhPWr 5 white 26.0
5 03gDhPWr 6 black 31.0
6 03gDhPWr 7 white -2.0
... ... ... ... ...
110091 zzaiRa7s 34 black NaN
110092 zzaiRa7s 35 white NaN
110093 zzaiRa7s 36 black NaN
110094 zzaiRa7s 37 white NaN
110095 zzaiRa7s 38 black NaN
110096 zzaiRa7s 39 white NaN
110097 zzaiRa7s 40 black NaN
具体来说,我正在使用pd.cut
创建一个新列game_phase
,该列列出了给定的举动是在开局,中局还是残局中进行的。
我正在使用以下代码来实现这一目标。请注意,每个游戏都必须根据该游戏中的总移动次数分为opening
,middlegame
和endgame
箱。
def define_move_phase(x):
bins = (0, round(x['move_number'].max() * 1/3), round(x['move_number'].max() * 2/3), x['move_number'].max())
phases = ["opening", "middlegame", "endgame"]
try:
x.loc[:, 'phase'] = pd.cut(x['move_number'], bins, labels=phases)
except ValueError:
x.loc[:, 'phase'] = None
print(x)
df.groupby('game_id').apply(define_move_phase)
该函数中的print
语句表明该函数正在各个组上工作(请参见下文),但不会将phase
列应用于原始数据框。
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
game_id move_number colour avg_centi phase
0 03gDhPWr 1 white NaN opening
1 03gDhPWr 2 black 37.0 opening
2 03gDhPWr 3 white 61.0 opening
3 03gDhPWr 4 black -5.0 opening
4 03gDhPWr 5 white 26.0 opening
5 03gDhPWr 6 black 31.0 opening
6 03gDhPWr 7 white -2.0 opening
.. ... ... ... ... ...
54 03gDhPWr 55 white 58.0 endgame
55 03gDhPWr 56 black 26.0 endgame
56 03gDhPWr 57 white 116.0 endgame
57 03gDhPWr 58 black 2000.0 endgame
58 03gDhPWr 59 white 0.0 endgame
59 03gDhPWr 60 black 0.0 endgame
60 03gDhPWr 61 white NaN endgame
[61 rows x 5 columns]
等...
我想将新的phase
列应用于原始数据帧,或者再次将已分组的数据帧取消分组为一个大数据帧。这样做的最佳方法是什么?
答案 0 :(得分:1)
您的函数没有return语句