基于两个变量对有序数据帧进行分组

时间:2017-03-01 11:31:10

标签: r dataframe grouping

我有一个数据框如下:

     Id         DateTIme Status
 1: 101 10/01/2014 09:32     On
 2: 101 10/01/2014 10:50     On
 3: 101 10/01/2014 21:32    Off
 4: 101 11/01/2014 15:32    Off
 5: 101 11/01/2014 21:21    Off
 6: 127 10/01/2014 10:13    Off
 7: 127 11/01/2014 20:21    Off
 8: 127 11/01/2014 23:10    Off
 9: 127 12/01/2014 12:02    Off
10: 127 12/01/2014 21:00     On
11: 127 13/01/2014 03:24     On
12: 763 11/01/2014 12:01    Off
13: 763 11/01/2014 22:10    Off
14: 763 12/01/2014 09:32     On
15: 763 13/01/2014 09:21     On
16: 763 13/01/2014 20:23     On
17: 763 14/01/2014 15:12     On
18: 763 14/01/2014 23:51    Off
19: 763 15/01/2014 09:23    Off

数据框按Id和DateTime排序。

我需要为每个Id找到每个状态更改的初始和结束时间。所以,在这种情况下,我希望输出这样的东西:

    Id Status      InitialTime          EndTime
1: 101     On 10/01/2014 09:32 10/01/2014 10:50
2: 101    Off 10/01/2014 21:32 11/01/2014 21:21
3: 127    Off 10/01/2014 10:13 12/01/2014 12:02
4: 127     On 12/01/2014 21:00 13/01/2014 03:24
5: 763    Off 11/01/2014 12:01 11/01/2014 22:10
6: 763     On 12/01/2014 09:32 14/01/2014 15:12
7: 763    Off 14/01/2014 23:51 15/01/2014 09:23

1 个答案:

答案 0 :(得分:1)

OP的数据似乎已经是data.table。万一,它不是,转换为'data.table'(setDT(df1)),按'Id','状态'和'状态'的游程长度ID分组,我们得到第一个'DateTIme'并且最后'DateTIme'总结数据集以分别具有'InitialTime'和'EndTime'列

library(data.table)
setDT(df1)[, .(InitialTime = DateTIme[1L], EndTime=DateTIme[.N]) , 
   .(Id, Status, Status1 = rleid(Status))][, Status1 := NULL][]
#    Id Status      InitialTime          EndTime
#1: 101     On 10/01/2014 09:32 10/01/2014 10:50
#2: 101    Off 10/01/2014 21:32 11/01/2014 21:21
#3: 127    Off 10/01/2014 10:13 12/01/2014 12:02
#4: 127     On 12/01/2014 21:00 13/01/2014 03:24
#5: 763    Off 11/01/2014 12:01 11/01/2014 22:10
#6: 763     On 12/01/2014 09:32 14/01/2014 15:12
#7: 763    Off 14/01/2014 23:51 15/01/2014 09:23