我的目标是将第一个和上一个时间戳之间的分钟数除以id
,其中时间戳相隔10分钟。应为独立时间戳分配10
。如果存在一系列时间戳,则时间差应等于序列中的最后一个 - 首先在序列中。例如:
> test
id timestamp
1 1 2018-01-02 00:40:00
2 1 2018-01-02 00:50:00
4 1 2018-01-02 01:10:00
5 1 2018-01-02 01:20:00
6 1 2018-01-02 01:30:00
7 1 2018-01-02 02:00:00
8 2 2018-01-02 01:50:00
9 2 2018-01-02 02:00:00
10 2 2018-01-02 02:10:00
11 2 2018-01-02 02:20:00
12 2 2018-01-02 02:30:00
13 2 2018-01-02 02:40:00
14 2 2018-01-02 03:10:00
15 2 2018-01-02 03:20:00
应该产生以下输出:
> output
id period elapsed
1 1 2018-01-02 00:40:00 - 2018-01-02 00:50:00 10
2 1 2018-01-02 01:10:00 - 2018-01-02 01:30:00 20
3 1 2018-01-02 02:00:00 10
4 2 2018-01-02 01:50:00 - 2018-01-02 02:40:00 50
5 2 2018-01-02 03:10:00 - 2018-01-02 03:20:00 10
对data.table
或dplyr
的任何建议都表示赞赏。我猜测伪代码看起来像这样
setDT(teset)
test[, .(elapsed := ifelse(difftime(last_timestamp, first_timestamp) > 10, difftime(last_timestamp, first_timestamp), 10) .(period := paste(first_timestamp, "-", last_timestamp)), by = id]
以下是样本数据集:
> dput(test)
structure(list(id = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2), timestamp = structure(c(1514875200, 1514875800, 1514877000,
1514877600, 1514878200, 1514880000, 1514879400, 1514880000, 1514880600,
1514881200, 1514881800, 1514882400, 1514884200, 1514884800), class = c("POSIXct",
"POSIXt"), tzone = "America/Chicago")), .Names = c("id", "timestamp"
), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 15L), class = "data.frame")
答案 0 :(得分:4)
为差异计算> 10
分钟,然后分组:
setDT(test)
test[, grp := cumsum(c(0,diff(timestamp)) > 10) , by=id]
test[,
.(
period = paste(timestamp[1], timestamp[.N], sep=" - "),
elapsed = difftime(timestamp[.N], timestamp[1], units="mins")
),
by=.(id,grp)
]
# id grp period elapsed
#1: 1 0 2018-01-02 00:40:00 - 2018-01-02 00:50:00 10 mins
#2: 1 1 2018-01-02 01:10:00 - 2018-01-02 01:30:00 20 mins
#3: 1 2 2018-01-02 02:00:00 - 2018-01-02 02:00:00 0 mins
#4: 2 0 2018-01-02 01:50:00 - 2018-01-02 02:40:00 50 mins
#5: 2 1 2018-01-02 03:10:00 - 2018-01-02 03:20:00 10 mins