我编写了一个python脚本来计算体育比赛成绩的连胜纪录。
例如,播放器 A 的数据框如下所示:
time winner loser streak
1 A B 1
2 A C 2
3 A D 3
4 B A 0
5 A F 1
6 A G 2
7 H A 0
8 A X 1
9 A Y 2
10 A Z 3
streak-column基本上包含胜利的累积计数,但是当相应的玩家输掉时它会重置为0,因为这会结束连胜。
我现在只想输出大于2的条纹,但显然我希望所有匹配都有助于此条纹。
换句话说,查询将如下所示:提供导致条纹大于2的所有匹配
结果如下:
time winner loser streak
1 A B 1
2 A C 2
3 A D 3
8 A X 1
9 A Y 2
10 A Z 3
如何通过熊猫实现这一目标?
答案 0 :(得分:1)
一种解决方案是首先检测变化点,条纹变为0 且长于$ n $的点。您可以使用pct_change
和streak
列
streak_ends = np.where((df['streak'].pct_change() > 0) & (df['streak'] > 2))[0]
然后你只需要得到条纹的起点即终点减去条纹的长度
streaks = [slice(idx - df.loc[idx, 'streak'] + 1, idx + 1) for idx in streak_ends]
streaks
Out[86]: [slice(0, 3, None), slice(7, 10, None)]
df[streaks[0], ['winner', 'streak']]
Out[87]:
winner streak
7 A 1
8 A 2
9 A 3
UPDATE
结果itertools.grouper
做得更好
import itertools
df['A wins'] = df.winner == 'A'
# rolling groupby using itertools
groups = [list(s) for i, s in itertools.groupby(df['A wins'])]
# filter out streaks that are shorter than the desired period
# itertools.chain is needed to unpack the nested groups
streaks = list(itertools.chain(*[g if len(g) > 2 else [False] * len(g)
for g in groups]))
df.loc[streaks, ['winner', 'streak']]
Out[83]:
winner streak
0 A 1
1 A 2
2 A 3
7 A 1
8 A 2
9 A 3
10 A 4