我有2个数据框,一个包含登录和注销时间,另一个包含事件详细信息。我需要计算每天事件之间的时差,还要考虑它们的登录和注销时间。
df1:
+------+-----------------------------+----------+-----------------------------+---------+
| Id | Start | PersonId | End | ShiftId |
+------+-----------------------------+----------+-----------------------------+---------+
| 3258 | 2019-09-24 08:15:03.1661178 | 10102 | 2019-09-24 08:28:10.7341840 | 191 |
| 3262 | 2019-09-24 08:40:07.8842560 | 10102 | 2019-09-24 09:00:57.9763744 | 191 |
| 3268 | 2019-09-24 10:14:18.2592931 | 10102 | 2019-09-24 10:32:38.8476858 | 191 |
| 3272 | 2019-09-24 11:57:49.3470913 | 10102 | 2019-09-24 12:09:06.1498356 | 191 |
| 3280 | 2019-09-24 14:10:53.2358758 | 10102 | 2019-09-24 14:38:43.6855268 | 191 |
| 3287 | 2019-09-25 04:44:18.8789158 | 10102 | 2019-09-25 05:27:31.7607861 | 201 |
| 3291 | 2019-09-25 06:21:27.9382344 | 10102 | 2019-09-25 06:29:34.5009788 | 201 |
| 3293 | 2019-09-25 06:50:58.7228054 | 10102 | 2019-09-25 07:33:53.7195993 | 201 |
| 3309 | 2019-09-25 11:33:55.5238972 | 10102 | 2019-09-25 11:48:11.8716401 | 201 |
| 3313 | 2019-09-25 11:55:09.3772345 | 10102 | 2019-09-25 12:11:01.2300854 | 201 |
| 3319 | 2019-09-25 12:44:19.3644289 | 10102 | 2019-09-25 13:32:34.4384967 | 201 |
| 3323 | 2019-09-25 14:37:28.4818603 | 10102 | 2019-09-25 15:06:48.5209333 | 201 |
+------+-----------------------------+----------+-----------------------------+---------+
df2:
+----------+-----------------------------+-----------------------------+
| PersonId | LoginTime | LogoutTime |
+----------+-----------------------------+-----------------------------+
| 10102 | 2019-09-24 05:07:27.3883395 | 2019-09-24 15:49:07.4924940 |
| 10102 | 2019-09-25 03:25:32.8983664 | 2019-09-25 15:54:45.5404037 |
| 10102 | 2019-09-26 02:28:53.1234933 | 2019-09-26 15:00:10.1138188 |
+----------+-----------------------------+-----------------------------+
所需的输出:
+------+-----------------------------+----------+-----------------------------+---------+------------+
| Id | Start | PersonId | End | ShiftId | After |
+------+-----------------------------+----------+-----------------------------+---------+------------+
| 3258 | 2019-09-24 08:15:03.1661178 | 10102 | 2019-09-24 08:28:10.7341840 | 191 | 188 mins |--> df1[start]-df2[logon] of same date
| 3262 | 2019-09-24 08:40:07.8842560 | 10102 | 2019-09-24 09:00:57.9763744 | 191 | 12min |
| 3268 | 2019-09-24 10:14:18.2592931 | 10102 | 2019-09-24 10:32:38.8476858 | 191 | 14 min |
| 3272 | 2019-09-24 11:57:49.3470913 | 10102 | 2019-09-24 12:09:06.1498356 | 191 | 85 min |
| 3280 | 2019-09-24 14:10:53.2358758 | 10102 | 2019-09-24 14:38:43.6855268 | 191 | 71 min |
| 3287 | 2019-09-25 04:44:18.8789158 | 10102 | 2019-09-25 05:27:31.7607861 | 201 | 79min |
| 3291 | 2019-09-25 06:21:27.9382344 | 10102 | 2019-09-25 06:29:34.5009788 | 201 | 54 min |
| 3293 | 2019-09-25 06:50:58.7228054 | 10102 | 2019-09-25 07:33:53.7195993 | 201 | 21min |
| 3309 | 2019-09-25 11:33:55.5238972 | 10102 | 2019-09-25 11:48:11.8716401 | 201 | 4 hrs |
| 3313 | 2019-09-25 11:55:09.3772345 | 10102 | 2019-09-25 12:11:01.2300854 | 201 | 7 min |
| 3319 | 2019-09-25 12:44:19.3644289 | 10102 | 2019-09-25 13:32:34.4384967 | 201 | 33mins |
| 3323 | 2019-09-25 14:37:28.4818603 | 10102 | 2019-09-25 15:06:48.5209333 | 201 | 65mins |
+------+-----------------------------+----------+-----------------------------+---------+------------+
基本上是尝试计算事件之间的差异,例如第一个事件在登录后188分钟发生,第二个事件在12分钟后发生,但每天发生一次。 我的方法是使用几天和几个月的循环。它工作正常,但速度缓慢,混乱。需要熊猫方法来获得所需的输出。 到目前为止,我已经尝试过了:
df1['Date']=df1['End'].dt.date
df1['Day'] = df1['End'].dt.day
df1['f']=df1.groupby(['PersonId', 'Day'])['Start'].shift(-1)-df1['End']
df1['f']=df1['f'].shift()
这将使我在df1事件之间有时间,但是现在如何计算longon and first event
和last event and logout
之间的时间差
感谢您的帮助。