我正在尝试创建累积的每月ytd计算,其中基于阈值,'玩家'仅计入分子&分母一旦(只要他们有记录分数)并且可以进入“达到阈值”组,然后即使他们后来有一个不符合阈值的分数,他们仍然在该组中。
我不知道该怎么称呼这个模型,但是我会喜欢任何有关实现的想法或者对这种逻辑的考虑因素,以便我可以自己研究。
以下是输入数据框的示例:
Player Month Score Qualified?
A January 3 N
A February 4 Y
A March 5 Y
A April 5 Y
B January 4 Y
B February 3 N
C March 5 Y
D February 3 N
D March 4 Y
D April 3 N
E April 1 N
输出:(播放器名称仅用于帮助跟踪逻辑)
Month Qualified Players Players
January 1 (B) 2 (A, B)
February 2 (A, B) 3 (A, B, D)
March 4 (A, B, C, D) 4 (A, B, C, D)
April 4 (A, B, C, D) 5 (A, B, C, D, E)
更新: 以上是最简单的模型。在更深层次上,我希望有多个阈值组,类似地,玩家可以向上移动一个阈值组,但永远不会向下移动。例如:
阈值组=低(1-2),中(3-4),高(5)
输入df(与上述相同):
Player Month Score Qualified?
A January 3 N
A February 4 Y
A March 5 Y
A April 5 Y
B January 4 Y
B February 3 N
C March 5 Y
D February 3 N
D March 4 Y
D April 3 N
E April 1 N
输出df:
Month Threshold Group Player Count
1 Low 0
1 Medium 2 (A, B)
1 High 0
2 Low 0
2 Medium 3 (A, B, D)
2 High 0
3 Low 0
3 Medium 2 (B, D)
3 High 2 (A, C)
4 Low 1 (E)
4 Medium 2 (B, D)
4 High 2 (A, C)
答案 0 :(得分:1)
怎么样:
>>> df = pd.DataFrame(data={'player':list('AAAABBCDDDE'), 'month':[1,2,3,4,1,2,3,2,3,4,4], 'score':[3,4,5,5,4,3,5,3,4,3,1]})
>>> df
month player score
0 1 A 3
1 2 A 4
2 3 A 5
3 4 A 5
4 1 B 4
5 2 B 3
6 3 C 5
7 2 D 3
8 3 D 4
9 4 D 3
10 4 E 1
>>> res = df.groupby('month')
.apply(func=lambda x: ''.join(x.player.values))
.rename('active')
.to_frame()
>>> res['qualified'] = df.groupby('month')
.apply(func=lambda x: ''.join(x[x.score>=4].player.values))
>>> res
active qualified
month
1 AB B
2 ABD A
3 ACD ACD
4 ADE A
>>> res.cumsum().applymap(lambda x: np.unique(list(x)))
active qualified
month
1 [A, B] [B]
2 [A, B, D] [A, B]
3 [A, B, C, D] [A, B, C, D]
4 [A, B, C, D, E] [A, B, C, D]
老实说,我不喜欢这个解决方案,但到目前为止没有找到更好的方法:(