基于门槛的累计每月年初至今

时间:2016-11-02 06:23:13

标签: python pandas python-3.5 threshold

我正在尝试创建累积的每月ytd计算,其中基于阈值,'玩家'仅计入分子&分母一旦(只要他们有记录分数)并且可以进入“达到阈值”组,然后即使他们后来有一个不符合阈值的分数,他们仍然在该组中。

我不知道该怎么称呼这个模型,但是我会喜欢任何有关实现的想法或者对这种逻辑的考虑因素,以便我可以自己研究。

以下是输入数据框的示例:

 Player     Month   Score   Qualified?
 A          January     3   N
 A          February    4   Y
 A          March       5   Y
 A          April       5   Y
 B          January     4   Y
 B          February    3   N
 C          March       5   Y
 D          February    3   N
 D          March       4   Y
 D          April       3   N
 E          April       1   N

输出:(播放器名称仅用于帮助跟踪逻辑)

 Month      Qualified Players         Players
 January    1 (B)                      2 (A, B)
 February   2 (A, B)                   3 (A, B, D)
 March      4 (A, B, C, D)             4 (A, B, C, D)
 April      4 (A, B, C, D)             5 (A, B, C, D, E)

更新: 以上是最简单的模型。在更深层次上,我希望有多个阈值组,类似地,玩家可以向上移动一个阈值组,但永远不会向下移动。例如:

阈值组=低(1-2),中(3-4),高(5)

输入df(与上述相同):

 Player     Month   Score   Qualified?
 A          January     3   N
 A          February    4   Y
 A          March       5   Y
 A          April       5   Y
 B          January     4   Y
 B          February    3   N
 C          March       5   Y
 D          February    3   N
 D          March       4   Y
 D          April       3   N
 E          April       1   N

输出df:

 Month  Threshold Group     Player Count
 1      Low                 0
 1      Medium              2 (A, B)
 1      High                0
 2      Low                 0
 2      Medium              3 (A, B, D)
 2      High                0
 3      Low                 0
 3      Medium              2 (B, D)
 3      High                2 (A, C)
 4      Low                 1 (E)
 4      Medium              2 (B, D)
 4      High                2 (A, C)

1 个答案:

答案 0 :(得分:1)

怎么样:

>>> df = pd.DataFrame(data={'player':list('AAAABBCDDDE'), 'month':[1,2,3,4,1,2,3,2,3,4,4], 'score':[3,4,5,5,4,3,5,3,4,3,1]})
>>> df

    month player  score
0       1      A      3
1       2      A      4
2       3      A      5
3       4      A      5
4       1      B      4
5       2      B      3
6       3      C      5
7       2      D      3
8       3      D      4
9       4      D      3
10      4      E      1

>>> res = df.groupby('month')
            .apply(func=lambda x: ''.join(x.player.values))
            .rename('active')
            .to_frame()

>>> res['qualified'] = df.groupby('month')
                         .apply(func=lambda x: ''.join(x[x.score>=4].player.values))

>>> res

      active qualified
month                 
1         AB         B
2        ABD         A
3        ACD       ACD
4        ADE         A

>>> res.cumsum().applymap(lambda x: np.unique(list(x)))

                active     qualified
month                               
1               [A, B]           [B]
2            [A, B, D]        [A, B]
3         [A, B, C, D]  [A, B, C, D]
4      [A, B, C, D, E]  [A, B, C, D]
老实说,我不喜欢这个解决方案,但到目前为止没有找到更好的方法:(