这是我拥有的大数据集的一部分
id=rep(c("J1", "J2", "J3"), each = 6)
episodes=c(1,1,0,0,1,0,1,0,0,1,0,0,1,0,1,1,0,1)
follow_up=as.Date(c("2004-11-05","2004-12-12","2004-12-26","2005-01-12","2005-01-24",
"2005-02-18","2005-07-14","2005-08-22","2005-10-17","2005-12-19",
"2006-01-14","2006-02-12","2006-03-05","2006-04-12","2006-05-22",
"2006-06-18","2006-07-21","2006-08-12"))
measurement_date=as.factor(c(rep(c(0),times=3),"2005-01-12",rep(c(0),times=4),
"2005-10-17",rep(c(0),times=7),"2005-07-21",0))
df=data.frame(id,episodes,follow_up,measurement_date)
df$measurement_date[df$measurement_date == 0] <- NA`
df
id episodes follow_up measurement_date
1 J1 1 2004-11-05 <NA>
2 J1 1 2004-12-12 <NA>
3 J1 0 2004-12-26 <NA>
4 J1 0 2005-01-12 2005-01-12
5 J1 1 2005-01-24 <NA>
6 J1 0 2005-02-18 <NA>
7 J2 1 2005-07-14 <NA>
8 J2 0 2005-08-22 <NA>
9 J2 0 2005-10-17 2005-10-17
10 J2 1 2005-12-19 <NA>
11 J2 0 2006-01-14 <NA>
12 J2 0 2006-02-12 <NA>
13 J3 1 2006-03-05 <NA>
14 J3 0 2006-04-12 <NA>
15 J3 1 2006-05-22 <NA>
16 J3 1 2006-06-18 <NA>
17 J3 0 2006-07-21 2005-07-21
18 J3 1 2006-08-12 <NA>
我想找到自上一集到测量日期以来的时差。这是一个大型数据集我该怎么办呢。例如对于J1,最后一集是在12/12/2004。这个日期和12/01/2005之间的差异
答案 0 :(得分:0)
我们可以尝试
library(data.table)
setDT(df)[, measurement_date := as.Date(measurement_date)]
df[, {i1 <- which(!is.na(measurement_date))
i2 <- which(episodes == 1)
.(date_diff = measurement_date[i1] - follow_up[tail(i2[which(i1 > i2)], 1)]) },
by = id]