用datetime操作熊猫列

时间:2019-06-12 16:22:38

标签: python pandas group-by transform

请参阅此Manipulating pandas columns帖子

我分享了此数据框:

+----------+------------+-------+-----+------+
| Location |    Date    | Event | Key | Time |
+----------+------------+-------+-----+------+
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-02 |     1 | a   |      |
| i2       | 2019-03-04 |     1 | a   |    2 |
| i2       | 2019-03-15 |     2 | b   |    0 |
| i9       | 2019-02-22 |     2 | c   |    0 |
| i9       | 2019-03-10 |     3 | d   |      |
| i9       | 2019-03-10 |     3 | d   |    0 |
| s8       | 2019-04-22 |     1 | e   |      |
| s8       | 2019-04-25 |     1 | e   |      |
| s8       | 2019-04-28 |     1 | e   |    6 |
| t14      | 2019-05-13 |     3 | f   |      |
+----------+------------+-------+-----+------+

这是一个后续问题。如下所示,在“日期”之后再考虑两列。

+-----------------------+----------------------+
| Start Time (hh:mm:ss) | Stop Time (hh:mm:ss) |
+-----------------------+----------------------+
| 13:24:38              | 14:17:39             |
| 03:48:36              | 04:17:20             |
| 04:55:05              | 05:23:48             |
| 08:44:34              | 09:13:15             |
| 19:21:05              | 20:18:57             |
| 21:05:06              | 22:01:50             |
| 14:24:43              | 14:59:37             |
| 07:57:32              | 09:46:21 
| 19:21:05              | 20:18:57             |
| 21:05:06              | 22:01:50             |
| 14:24:43              | 14:59:37             |
| 07:57:32              | 09:46:21             |
+-----------------------+----------------------+

任务保持不变-获得时间差,但以小时为单位,对应于第一行的停止时间和最后一行的开始时间 每个键。

基于答案,我正在尝试类似的事情:

df['Time']=df.groupby(['Location','Event']).Date.\
           transform(lambda x : (x.iloc[-1]-x.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]

df['Time_h']=df.groupby(['Location','Event'])['Start Time (hh:mm:ss)','Stop Time (hh:mm:ss)'].\
            transform(lambda x,y : (x.iloc[-1]-y.iloc[0]))[~df.duplicated(['Location','Event'],keep='last')]    # This gives an error on transform 

分别计算天数和小时数之间的差异,然后合并。有没有更好的办法?

0 个答案:

没有答案