自上次事件到r中特定日期的时差

时间:2017-05-30 06:21:33

标签: r

这是我拥有的大数据集的一部分

id=rep(c("J1", "J2", "J3"), each = 6)
episodes=c(1,1,0,0,1,0,1,0,0,1,0,0,1,0,1,1,0,1)
follow_up=as.Date(c("2004-11-05","2004-12-12","2004-12-26","2005-01-12","2005-01-24",
        "2005-02-18","2005-07-14","2005-08-22","2005-10-17","2005-12-19",
        "2006-01-14","2006-02-12","2006-03-05","2006-04-12","2006-05-22",
        "2006-06-18","2006-07-21","2006-08-12"))
measurement_date=as.factor(c(rep(c(0),times=3),"2005-01-12",rep(c(0),times=4),
              "2005-10-17",rep(c(0),times=7),"2005-07-21",0))
df=data.frame(id,episodes,follow_up,measurement_date)
df$measurement_date[df$measurement_date == 0] <- NA`

 df
   id episodes  follow_up measurement_date
1  J1        1 2004-11-05             <NA>
2  J1        1 2004-12-12             <NA>
3  J1        0 2004-12-26             <NA>
4  J1        0 2005-01-12       2005-01-12
5  J1        1 2005-01-24             <NA>
6  J1        0 2005-02-18             <NA>
7  J2        1 2005-07-14             <NA>
8  J2        0 2005-08-22             <NA>
9  J2        0 2005-10-17       2005-10-17
10 J2        1 2005-12-19             <NA>
11 J2        0 2006-01-14             <NA>
12 J2        0 2006-02-12             <NA>
13 J3        1 2006-03-05             <NA>
14 J3        0 2006-04-12             <NA>
15 J3        1 2006-05-22             <NA>
16 J3        1 2006-06-18             <NA>
17 J3        0 2006-07-21       2005-07-21
18 J3        1 2006-08-12             <NA>

我想找到自上一集到测量日期以来的时差。这是一个大型数据集我该怎么办呢。例如对于J1,最后一集是在12/12/2004。这个日期和12/01/2005之间的差异

1 个答案:

答案 0 :(得分:0)

我们可以尝试

library(data.table)
setDT(df)[, measurement_date := as.Date(measurement_date)]
df[, {i1 <- which(!is.na(measurement_date))
             i2 <-  which(episodes == 1)
         .(date_diff = measurement_date[i1] - follow_up[tail(i2[which(i1 > i2)], 1)]) },
              by = id]