data.table中的条件已经过了一段时间

时间:2018-02-07 22:57:05

标签: r dplyr data.table subset

我的目标是将第一个和上一个时间戳之间的分钟数除以id,其中时间戳相隔10分钟。应为独立时间戳分配10。如果存在一系列时间戳,则时间差应等于序列中的最后一个 - 首先在序列中。例如:

> test
   id           timestamp
1   1 2018-01-02 00:40:00
2   1 2018-01-02 00:50:00
4   1 2018-01-02 01:10:00
5   1 2018-01-02 01:20:00
6   1 2018-01-02 01:30:00
7   1 2018-01-02 02:00:00
8   2 2018-01-02 01:50:00
9   2 2018-01-02 02:00:00
10  2 2018-01-02 02:10:00
11  2 2018-01-02 02:20:00
12  2 2018-01-02 02:30:00
13  2 2018-01-02 02:40:00
14  2 2018-01-02 03:10:00
15  2 2018-01-02 03:20:00

应该产生以下输出:

> output
  id                                    period elapsed
1  1 2018-01-02 00:40:00 - 2018-01-02 00:50:00      10
2  1 2018-01-02 01:10:00 - 2018-01-02 01:30:00      20
3  1                       2018-01-02 02:00:00      10
4  2 2018-01-02 01:50:00 - 2018-01-02 02:40:00      50
5  2 2018-01-02 03:10:00 - 2018-01-02 03:20:00      10

data.tabledplyr的任何建议都表示赞赏。我猜测伪代码看起来像这样

 setDT(teset)
    test[, .(elapsed := ifelse(difftime(last_timestamp, first_timestamp) > 10, difftime(last_timestamp, first_timestamp), 10) .(period := paste(first_timestamp, "-", last_timestamp)), by = id]

以下是样本数据集:

   > dput(test)
structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 
2), timestamp = structure(c(1514875200, 1514875800, 1514877000, 
1514877600, 1514878200, 1514880000, 1514879400, 1514880000, 1514880600, 
1514881200, 1514881800, 1514882400, 1514884200, 1514884800), class = c("POSIXct", 
"POSIXt"), tzone = "America/Chicago")), .Names = c("id", "timestamp"
), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
13L, 14L, 15L), class = "data.frame")

1 个答案:

答案 0 :(得分:4)

为差异计算> 10分钟,然后分组:

setDT(test)
test[, grp := cumsum(c(0,diff(timestamp)) > 10) , by=id]
test[,
  .(
     period  = paste(timestamp[1], timestamp[.N], sep=" - "),
     elapsed = difftime(timestamp[.N], timestamp[1], units="mins")
   ),
  by=.(id,grp)
]

#   id grp                                    period elapsed
#1:  1   0 2018-01-02 00:40:00 - 2018-01-02 00:50:00 10 mins
#2:  1   1 2018-01-02 01:10:00 - 2018-01-02 01:30:00 20 mins
#3:  1   2 2018-01-02 02:00:00 - 2018-01-02 02:00:00  0 mins
#4:  2   0 2018-01-02 01:50:00 - 2018-01-02 02:40:00 50 mins
#5:  2   1 2018-01-02 03:10:00 - 2018-01-02 03:20:00 10 mins