我希望添加一个字段来计算每个组中的连续天数(由id字段捕获)。我从这个开始:
dt <- structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = c("1/01/2000", "2/01/2000", "2/01/2000",
"5/01/2000", "6/01/2000", "7/01/2000", "8/01/2000", "13/01/2000", "14/01/2000",
"18/01/2000", "19/01/2000", "21/01/2000", "25/01/2000", "26/01/2000",
"30/01/2000", "31/01/2000")), .Names = c("id", "date"),
row.names = c(NA, -16L), class = "data.frame")
并希望获得以下信息,最好使用data.table:
id date cons
1 1/01/2000 0
1 2/01/2000 1
1 2/01/2000 1
1 5/01/2000 0
1 6/01/2000 1
1 7/01/2000 2
1 8/01/2000 3
2 13/01/2000 0
2 14/01/2000 1
2 18/01/2000 0
2 19/01/2000 1
2 21/01/2000 0
2 25/01/2000 0
2 26/01/2000 1
2 30/01/2000 0
2 31/01/2000 1
答案 0 :(得分:4)
这是使用dplyr
library(dplyr)
dt %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
group_by(id) %>%
group_by(grp = cumsum(c(TRUE, diff(date) > 1)), add = TRUE) %>%
mutate(cons = as.integer(date - first(date))) %>%
ungroup %>%
select(-grp)
# id date cons
# <int> <date> <int>
# 1 1 2000-01-01 0
# 2 1 2000-01-02 1
# 3 1 2000-01-02 1
# 4 1 2000-01-05 0
# 5 1 2000-01-06 1
# 6 1 2000-01-07 2
# 7 1 2000-01-08 3
# 8 2 2000-01-13 0
# 9 2 2000-01-14 1
#10 2 2000-01-18 0
#11 2 2000-01-19 1
#12 2 2000-01-21 0
#13 2 2000-01-25 0
#14 2 2000-01-26 1
#15 2 2000-01-30 0
#16 2 2000-01-31 1
标记了此data.table
后,可以将其翻译为data.table
library(data.table)
setDT(dt)
dt[, date := as.Date(date, "%d/%m/%Y")]
dt[, cons := as.integer(date - first(date)), .(id, cumsum(c(TRUE, diff(date) > 1)))]
答案 1 :(得分:0)
我可能正在使事情复杂化,但是如果您的数据集很大,这应该是一个更快的选择:
setDT(dt)[, date := as.Date(date, format="%d/%m/%Y")]
#identify consecutive dates
dt[, c("cons", "d", "rr") := .(0L,
d <- c(FALSE, diff(date) == 1L),
rowid(rleid(id, d)))]
#update rows with consecutive dates
idx <- dt[(d), which=TRUE]
set(dt, idx, "cons", dt[idx, rr])
#handle identical dates
ix <- dt[id==shift(id) & c(FALSE, diff(date)==0L), which=TRUE]
set(dt, ix, "cons", dt[ix - 1L, cons])
输出:
id date cons d rr
1: 1 2000-01-01 0 FALSE 1
2: 1 2000-01-02 1 TRUE 1
3: 1 2000-01-02 1 FALSE 1
4: 1 2000-01-05 0 FALSE 2
5: 1 2000-01-06 1 TRUE 1
6: 1 2000-01-07 2 TRUE 2
7: 1 2000-01-08 3 TRUE 3
8: 2 2000-01-13 0 FALSE 1
9: 2 2000-01-14 1 TRUE 1
10: 2 2000-01-18 0 FALSE 1
11: 2 2000-01-19 1 TRUE 1
12: 2 2000-01-21 0 FALSE 1
13: 2 2000-01-25 0 FALSE 2
14: 2 2000-01-26 1 TRUE 1
15: 2 2000-01-30 0 FALSE 1
16: 2 2000-01-31 1 TRUE 1