我想处理时间重叠,我设法通过group_overl列确定重叠以计算真实持续时间。但我无法处理它们。
这里有一个例子:
begin end duration group_overl
2019-10-21 07:39:26.356716 2019-10-21 07:42:02.574268 156.218 1
2019-10-21 07:40:03.235327 2019-10-21 07:42:02.222821 118.987 1
2019-10-21 07:42:52.299657 2019-10-21 07:43:19.834114 27.534 2
2019-10-21 07:44:09.936458 2019-10-21 07:44:37.143862 27.207 3
2019-10-21 07:45:27.488518 2019-10-21 07:45:54.122312 26.634 4
2019-10-21 07:57:27.564887 2019-10-21 08:26:00.413448 1712.849 11
2019-10-21 07:58:06.209659 2019-10-21 08:27:00.413448 1734.204 11
...
2020-06-12 16:22:41.855968 2020-06-12 16:23:31.073432 49.22 6421
2020-06-12 16:51:06.793336 2020-06-15 09:51:46.179767 234039.39 6422 <---Modification 3
2020-06-13 02:04:55.892438 2020-06-13 02:05:03.687360 58397 6422
2020-06-13 02:04:55.892443 2020-06-13 02:05:03.687365 0 6422
2020-06-13 02:04:55.892448 2020-06-13 02:05:03.687370 0 6422
2020-06-13 02:04:55.892452 2020-06-13 02:05:03.687374 0 6422
2020-06-13 02:04:56.217053 2020-06-13 02:05:03.979850 0 6422
2020-06-13 02:04:56.217058 2020-06-15 09:16:11.529059 25867 6422
2020-06-15 09:58:31.855886 2020-06-15 09:59:33.736478 61.88 6423
预期结果:
begin end duration group_overl
2019-10-21 07:39:26.356716 2019-10-21 07:42:02.574268 156.218 1
<---Modification 1
2019-10-21 07:42:52.299657 2019-10-21 07:43:19.834114 27.534 2
2019-10-21 07:44:09.936458 2019-10-21 07:44:37.143862 27.207 3
2019-10-21 07:45:27.488518 2019-10-21 07:45:54.122312 26.634 4
2019-10-21 07:57:27.564887 2019-10-21 08:26:00.413448 1712.849 11 <---row 5
2019-10-21 08:26:00.413448 2019-10-21 08:27:00.413448 60 11 <---Modification 2 (begin & duration)
...
2020-06-12 16:22:41.855968 2020-06-12 16:23:31.073432 49.22 6421
2020-06-12 16:51:06.793336 2020-06-15 09:51:46.179767 234039.39 6422
<---Modification 3
2020-06-15 09:58:31.855886 2020-06-15 09:59:33.736478 61.88 6423
所以我在@David Erickson的帮助下尝试使用此代码:
c1 = (df_overl['begin_'].between(df_overl['begin_'].shift(), df_overl['end_'].shift())
& df_overl['end_'].between(df_overl['begin_'].shift(), df_overl['end_'].shift()))
c2 = (df_overl['begin_'].between(df_overl['begin_'].shift(), df_overl['end_'].shift())
& df_overl['end_'].gt(df_overl['end_'].shift()))
df_overl = df_overl[~c1]
df_overl['duration_'] = df_overl['duration_'].where(~c2, (df_overl['end_'] - df_overl['end_'].shift()).dt.total_seconds())
df_overl = df_overl.reset_index(drop=True)
修改1可以,但是修改2可以,但是持续时间不是开始,修改3就不起作用...
这是我当前的结果:
2019-10-21 07:39:26.356716 2019-10-21 07:42:02.574268 156.218 1
2019-10-21 07:42:52.299657 2019-10-21 07:43:19.834114 27.534 2
2019-10-21 07:44:09.936458 2019-10-21 07:44:37.143862 27.207 3
2019-10-21 07:45:27.488518 2019-10-21 07:45:54.122312 26.634 4
2019-10-21 07:57:27.564887 2019-10-21 08:26:00.413448 1712.849 11
2019-10-21 07:58:06.209659 2019-10-21 08:27:00.413448 60 11
...
2020-06-12 16:22:41.855968 2020-06-12 16:23:31.073432 49.22 6421
2020-06-12 16:51:06.793336 2020-06-15 09:51:46.179767 234039.39 6422
2020-06-13 02:04:55.892443 2020-06-13 02:05:03.687365 0 6422
2020-06-13 02:04:55.892448 2020-06-13 02:05:03.687370 0 6422
2020-06-13 02:04:55.892452 2020-06-13 02:05:03.687374 0 6422
2020-06-13 02:04:56.217053 2020-06-13 02:05:03.979850 0.29 6422
2020-06-13 02:04:56.217058 2020-06-15 09:16:11.529059 198667.55 6422
2020-06-15 09:58:31.855886 2020-06-15 09:59:33.736478 61.88 6423
我尝试了几种治疗方法,但我做不到,谢谢您的时间!
**UPDATE**
我想到了一种方法::对于每个group_overl,都使用日期min和date max并计算持续时间?