我有下表:
library(data.table)
data <- data.table(Timestamp = c(as.POSIXct("2016-01-15 02:00:00"),
as.POSIXct("2016-01-15 04:00:00"),
as.POSIXct("2016-01-16 02:00:00"),
as.POSIXct("2016-01-15 05:00:00"),
as.POSIXct("2016-01-17 08:00:00"),
as.POSIXct("2016-01-17 08:00:00"),
as.POSIXct("2016-01-17 09:00:00"),
as.POSIXct("2016-01-22 09:00:00")),
Activty = c("Eating Beef",
"Eating Cake",
"Eating Beef",
"Eating Cake",
"Sleeping",
"Eating Beef",
"Eating Beef",
"Sleeping"),
Tag = c("S",
"S",
"E",
"E",
"S",
"S",
"E",
"E"
))
我想要做的是检索开始和结束时间。如果我们查看表格,我们会得到:
Timestamp Activty Tag
1: 2016-01-15 02:00:00 Eating Beef S
2: 2016-01-15 04:00:00 Eating Cake S
3: 2016-01-16 02:00:00 Eating Beef E
4: 2016-01-15 05:00:00 Eating Cake E
5: 2016-01-17 08:00:00 Sleeping S
6: 2016-01-17 08:00:00 Eating Beef S
7: 2016-01-17 09:00:00 Eating Beef E
8: 2016-01-22 09:00:00 Sleeping E
所以第1行是我开始吃牛肉的时候,第3行是我停止吃牛肉的时候。因此,给定具有相同活动的行,具有S的行应与第一个E匹配.S表示开始活动,而E表示结束活动。
我应该如何处理data.table中的这个问题?
最终结果应该是这样的:
StartTime EndTime Activty
1: 2016-01-15 02:00:00 2016-01-16 02:00:00 Eating Beef
2: 2016-01-15 04:00:00 2016-01-15 05:00:00 Eating Cake
5: 2016-01-17 08:00:00 2016-01-22 09:00:00 Sleeping
6: 2016-01-17 08:00:00 2016-01-17 09:00:00 Eating Beef
(此顺序是任意的,但仅用于描述最终结果,即将开始时间与结束时间相结合。)
答案 0 :(得分:0)
使用data.table的方法:
library(data.table)
data<-data[,n:=c(1:.N), by=list(Activty,Tag)][order(Activty,Tag,Timestamp)]
x <-dcast.data.table(data, Activty+n~Tag, value.var = 'Timestamp')
x$n <- NULL
x
Activty E S
1: Eating Beef 2016-01-16 02:00:00 2016-01-15 02:00:00
2: Eating Beef 2016-01-17 09:00:00 2016-01-17 08:00:00
3: Eating Cake 2016-01-15 05:00:00 2016-01-15 04:00:00
4: Sleeping 2016-01-22 09:00:00 2016-01-17 08:00:00
接近所需的输出。