直到满足条件的两个日期之间的天数差异

时间:2019-11-18 22:56:01

标签: python-3.x pandas

在找到条件之前,我很难计算连续的天数。 下表中给出了Gap done是我以解决方案形式there获得的混乱表,而Expected gap是我想要获得的输出。

+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| Player |   Result   |        Date         | Gap done |                                         Expected Gap                                         |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| K2000  | Lose       | 2015-11-13 13:42:00 | Nan      | Nan/0                                                                                        |
| K2000  | Lose       | 2016-03-23 16:40:00 | 131.0    | 131.0                                                                                        |
| K2000  | Lose       | 2016-05-16 19:17:00 | 54.0     | 185.0                                                                                        |
| K2000  | Win        | 2016-06-09 19:36:00 | 54.0     | 239.0 #he always lose before                                                                 |
| K2000  | Win        | 2016-06-30 14:05:00 | 54.0     | 54.0 #because he won last time, it's 54 days btw this current date and the last time he won. |
| K2000  | Lose       | 2016-07-29 16:20:00 | 29.0     | 29.0                                                                                         |
| K2000  | Win        | 2016-10-08 17:48:00 | 29.0     | 58.0                                                                                         |
| Kssis  | Lose       | 2007-02-25 15:05:00 | Nan      | Nan/0                                                                                        |
| Kssis  | Lose       | 2007-04-25 16:07:00 | 59.0     | 59.0                                                                                         |
| Kssis  | Not ranked | 2007-06-01 16:54:00 | 37.0     | 96.0                                                                                         |
| Kssis  | Lose       | 2007-09-09 14:33:00 | 99.0     | 195.0                                                                                        |
| Kssis  | Lose       | 2008-04-06 16:27:00 | 210.0    | 405.0                                                                                        |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+

解决方案there的问题在于它实际上并未计算日期。在此示例中,日期有可能总是相隔1天。

确定我适应了

def sum_days_in_row_with_condition(g):
    sorted_g = g.sort_values(by='date', ascending=True)
    condition = sorted_g['Result'] == 'Win'
    sorted_g['days-in-a-row'] = g.date.diff().dt.days.where(~condition).ffill()
    return sorted_g

但是,正如我向您展示的那样,这很混乱。

所以我考虑了一个解决方案,但是它需要全局变量(功能失效),这有点讲究。

任何人都可以以更简单的方式帮助解决此问题吗?


Pandas版本:0.23.4 Python版本:3.7.4

1 个答案:

答案 0 :(得分:1)

IIUC,您需要找到布尔掩码m1,其中win的上一行也是win。从m1创建一个组ID s,以分隔组win。将它们分成组并cumsum

m = df.Result.eq('Win')
m1 = m & m.shift()
s = m1.ne(m1.shift()).cumsum()
df['Expected Gap'] = df.groupby(['Player', s])['Gap done'].cumsum()

Out[808]:
   Player      Result                 Date  Gap done  Expected Gap
0   K2000        Lose  2015-11-13 13:42:00      NaN           NaN
1   K2000        Lose  2016-03-23 16:40:00    131.0         131.0
2   K2000        Lose  2016-05-16 19:17:00     54.0         185.0
3   K2000         Win  2016-06-09 19:36:00     54.0         239.0
4   K2000         Win  2016-06-30 14:05:00     54.0          54.0
5   K2000        Lose  2016-07-29 16:20:00     29.0          29.0
6   K2000         Win  2016-10-08 17:48:00     29.0          58.0
7   Kssis        Lose  2007-02-25 15:05:00      NaN           NaN
8   Kssis        Lose   2007-04-25 6:07:00     59.0          59.0
9   Kssis  Not-ranked  2007-06-01 16:54:00     37.0          96.0
10  Kssis        Lose  2007-09-09 14:33:00     99.0         195.0
11  Kssis        Lose  2008-04-06 16:27:00    210.0         405.0