大熊猫:计算两个不同数据框的列值之间的时间差

时间:2019-11-23 12:12:23

标签: python pandas dataframe group-by

我有2个数据框,一个包含登录和注销时间,另一个包含事件详细信息。我需要计算每天事件之间的时差,还要考虑它们的登录和注销时间。

df1:

+------+-----------------------------+----------+-----------------------------+---------+
|  Id  |            Start            | PersonId |             End             | ShiftId |
+------+-----------------------------+----------+-----------------------------+---------+
| 3258 | 2019-09-24 08:15:03.1661178 |    10102 | 2019-09-24 08:28:10.7341840 |     191 |
| 3262 | 2019-09-24 08:40:07.8842560 |    10102 | 2019-09-24 09:00:57.9763744 |     191 |
| 3268 | 2019-09-24 10:14:18.2592931 |    10102 | 2019-09-24 10:32:38.8476858 |     191 |
| 3272 | 2019-09-24 11:57:49.3470913 |    10102 | 2019-09-24 12:09:06.1498356 |     191 |
| 3280 | 2019-09-24 14:10:53.2358758 |    10102 | 2019-09-24 14:38:43.6855268 |     191 |
| 3287 | 2019-09-25 04:44:18.8789158 |    10102 | 2019-09-25 05:27:31.7607861 |     201 |
| 3291 | 2019-09-25 06:21:27.9382344 |    10102 | 2019-09-25 06:29:34.5009788 |     201 |
| 3293 | 2019-09-25 06:50:58.7228054 |    10102 | 2019-09-25 07:33:53.7195993 |     201 |
| 3309 | 2019-09-25 11:33:55.5238972 |    10102 | 2019-09-25 11:48:11.8716401 |     201 |
| 3313 | 2019-09-25 11:55:09.3772345 |    10102 | 2019-09-25 12:11:01.2300854 |     201 |
| 3319 | 2019-09-25 12:44:19.3644289 |    10102 | 2019-09-25 13:32:34.4384967 |     201 |
| 3323 | 2019-09-25 14:37:28.4818603 |    10102 | 2019-09-25 15:06:48.5209333 |     201 |
+------+-----------------------------+----------+-----------------------------+---------+

df2:


+----------+-----------------------------+-----------------------------+
| PersonId |          LoginTime          |         LogoutTime          |
+----------+-----------------------------+-----------------------------+
|    10102 | 2019-09-24 05:07:27.3883395 | 2019-09-24 15:49:07.4924940 |
|    10102 | 2019-09-25 03:25:32.8983664 | 2019-09-25 15:54:45.5404037 |
|    10102 | 2019-09-26 02:28:53.1234933 | 2019-09-26 15:00:10.1138188 |
+----------+-----------------------------+-----------------------------+

所需的输出:


+------+-----------------------------+----------+-----------------------------+---------+------------+
|  Id  |            Start            | PersonId |             End             | ShiftId |   After    |
+------+-----------------------------+----------+-----------------------------+---------+------------+
| 3258 | 2019-09-24 08:15:03.1661178 |    10102 | 2019-09-24 08:28:10.7341840 |     191 | 188 mins   |--> df1[start]-df2[logon] of same date
| 3262 | 2019-09-24 08:40:07.8842560 |    10102 | 2019-09-24 09:00:57.9763744 |     191 | 12min      |
| 3268 | 2019-09-24 10:14:18.2592931 |    10102 | 2019-09-24 10:32:38.8476858 |     191 | 14 min     |
| 3272 | 2019-09-24 11:57:49.3470913 |    10102 | 2019-09-24 12:09:06.1498356 |     191 | 85 min     |
| 3280 | 2019-09-24 14:10:53.2358758 |    10102 | 2019-09-24 14:38:43.6855268 |     191 | 71 min     |
| 3287 | 2019-09-25 04:44:18.8789158 |    10102 | 2019-09-25 05:27:31.7607861 |     201 | 79min      |
| 3291 | 2019-09-25 06:21:27.9382344 |    10102 | 2019-09-25 06:29:34.5009788 |     201 | 54 min     |
| 3293 | 2019-09-25 06:50:58.7228054 |    10102 | 2019-09-25 07:33:53.7195993 |     201 | 21min      |
| 3309 | 2019-09-25 11:33:55.5238972 |    10102 | 2019-09-25 11:48:11.8716401 |     201 | 4 hrs      |
| 3313 | 2019-09-25 11:55:09.3772345 |    10102 | 2019-09-25 12:11:01.2300854 |     201 | 7 min      |
| 3319 | 2019-09-25 12:44:19.3644289 |    10102 | 2019-09-25 13:32:34.4384967 |     201 | 33mins     |
| 3323 | 2019-09-25 14:37:28.4818603 |    10102 | 2019-09-25 15:06:48.5209333 |     201 | 65mins     |
+------+-----------------------------+----------+-----------------------------+---------+------------+

基本上是尝试计算事件之间的差异,例如第一个事件在登录后188分钟发生,第二个事件在12分钟后发生,但每天发生一次。 我的方法是使用几天和几个月的循环。它工作正常,但速度缓慢,混乱。需要熊猫方法来获得所需的输出。 到目前为止,我已经尝试过了:

df1['Date']=df1['End'].dt.date
df1['Day'] = df1['End'].dt.day
df1['f']=df1.groupby(['PersonId', 'Day'])['Start'].shift(-1)-df1['End']
df1['f']=df1['f'].shift()

这将使我在df1事件之间有时间,但是现在如何计算longon and first eventlast event and logout之间的时间差 感谢您的帮助。

0 个答案:

没有答案