我有一个数据框如下:
Id DateTIme Status
1: 101 10/01/2014 09:32 On
2: 101 10/01/2014 10:50 On
3: 101 10/01/2014 21:32 Off
4: 101 11/01/2014 15:32 Off
5: 101 11/01/2014 21:21 Off
6: 127 10/01/2014 10:13 Off
7: 127 11/01/2014 20:21 Off
8: 127 11/01/2014 23:10 Off
9: 127 12/01/2014 12:02 Off
10: 127 12/01/2014 21:00 On
11: 127 13/01/2014 03:24 On
12: 763 11/01/2014 12:01 Off
13: 763 11/01/2014 22:10 Off
14: 763 12/01/2014 09:32 On
15: 763 13/01/2014 09:21 On
16: 763 13/01/2014 20:23 On
17: 763 14/01/2014 15:12 On
18: 763 14/01/2014 23:51 Off
19: 763 15/01/2014 09:23 Off
数据框按Id和DateTime排序。
我需要为每个Id找到每个状态更改的初始和结束时间。所以,在这种情况下,我希望输出这样的东西:
Id Status InitialTime EndTime
1: 101 On 10/01/2014 09:32 10/01/2014 10:50
2: 101 Off 10/01/2014 21:32 11/01/2014 21:21
3: 127 Off 10/01/2014 10:13 12/01/2014 12:02
4: 127 On 12/01/2014 21:00 13/01/2014 03:24
5: 763 Off 11/01/2014 12:01 11/01/2014 22:10
6: 763 On 12/01/2014 09:32 14/01/2014 15:12
7: 763 Off 14/01/2014 23:51 15/01/2014 09:23
答案 0 :(得分:1)
OP的数据似乎已经是data.table
。万一,它不是,转换为'data.table'(setDT(df1)
),按'Id','状态'和'状态'的游程长度ID分组,我们得到第一个'DateTIme'并且最后'DateTIme'总结数据集以分别具有'InitialTime'和'EndTime'列
library(data.table)
setDT(df1)[, .(InitialTime = DateTIme[1L], EndTime=DateTIme[.N]) ,
.(Id, Status, Status1 = rleid(Status))][, Status1 := NULL][]
# Id Status InitialTime EndTime
#1: 101 On 10/01/2014 09:32 10/01/2014 10:50
#2: 101 Off 10/01/2014 21:32 11/01/2014 21:21
#3: 127 Off 10/01/2014 10:13 12/01/2014 12:02
#4: 127 On 12/01/2014 21:00 13/01/2014 03:24
#5: 763 Off 11/01/2014 12:01 11/01/2014 22:10
#6: 763 On 12/01/2014 09:32 14/01/2014 15:12
#7: 763 Off 14/01/2014 23:51 15/01/2014 09:23