如何根据时差将长格式转换为宽格式

时间:2017-05-03 14:20:26

标签: r data.table

我有下表:

library(data.table)

data <- data.table(Timestamp = c(as.POSIXct("2016-01-15 02:00:00"),
                             as.POSIXct("2016-01-15 04:00:00"),
                             as.POSIXct("2016-01-16 02:00:00"),
                             as.POSIXct("2016-01-15 05:00:00"),
                             as.POSIXct("2016-01-17 08:00:00"),
                             as.POSIXct("2016-01-17 08:00:00"),
                             as.POSIXct("2016-01-17 09:00:00"),
                             as.POSIXct("2016-01-22 09:00:00")),
               Activty = c("Eating Beef",
                           "Eating Cake",
                           "Eating Beef",
                           "Eating Cake",
                           "Sleeping",
                           "Eating Beef",
                           "Eating Beef",
                           "Sleeping"),
               Tag = c("S",
                       "S",
                       "E",
                       "E",
                       "S",
                       "S",
                       "E",
                       "E"
                       ))

我想要做的是检索开始和结束时间。如果我们查看表格,我们会得到:

            Timestamp     Activty    Tag
1: 2016-01-15 02:00:00 Eating Beef   S
2: 2016-01-15 04:00:00 Eating Cake   S
3: 2016-01-16 02:00:00 Eating Beef   E
4: 2016-01-15 05:00:00 Eating Cake   E
5: 2016-01-17 08:00:00 Sleeping      S
6: 2016-01-17 08:00:00 Eating Beef   S
7: 2016-01-17 09:00:00 Eating Beef   E
8: 2016-01-22 09:00:00 Sleeping      E

所以第1行是我开始吃牛肉的时候,第3行是我停止吃牛肉的时候。因此,给定具有相同活动的行,具有S的行应与第一个E匹配.S表示开始活动,而E表示结束活动。

我应该如何处理data.table中的这个问题?

最终结果应该是这样的:

            StartTime  EndTime             Activty
1: 2016-01-15 02:00:00 2016-01-16 02:00:00 Eating Beef
2: 2016-01-15 04:00:00 2016-01-15 05:00:00 Eating Cake
5: 2016-01-17 08:00:00 2016-01-22 09:00:00 Sleeping
6: 2016-01-17 08:00:00 2016-01-17 09:00:00 Eating Beef

(此顺序是任意的,但仅用于描述最终结果,即将开始时间与结束时间相结合。)

1 个答案:

答案 0 :(得分:0)

使用data.table的方法:

library(data.table)
data<-data[,n:=c(1:.N), by=list(Activty,Tag)][order(Activty,Tag,Timestamp)]
x <-dcast.data.table(data, Activty+n~Tag, value.var = 'Timestamp')
x$n <- NULL

x

       Activty                   E                   S
1: Eating Beef 2016-01-16 02:00:00 2016-01-15 02:00:00
2: Eating Beef 2016-01-17 09:00:00 2016-01-17 08:00:00
3: Eating Cake 2016-01-15 05:00:00 2016-01-15 04:00:00
4:    Sleeping 2016-01-22 09:00:00 2016-01-17 08:00:00 

接近所需的输出。