Question

我在R中有一个类似于以下的数据集。每个主题都有一行：

> ( fake = data.frame(id=c(1,2,3), x=c(42,61,50), event=c(0,0,1), followup=c(6,2,12)) )

  id  x event followup
1  1 42     0        6
2  2 61     0        2
3  3 50     1       12

我想将数据集拆分为由观察到的事件时间定义的间隔：

  id  x event start.time stop.time
1  1 42     0          0         2
2  1 42     0          2         6
3  2 61     0          0         2
4  3 50     0          0         2
5  3 50     0          2         6
6  3 50     1          6        12

因此，每个受试者都会收到比他自己的后续时间更短的所有事件时间的间隔。在时间12发生事件的受试者3在他还活着时的较早时间间隔内收到0。

我该怎么做？实际数据集有大约20,000行和900个唯一的事件时间。

Answer 1

条件不是很清楚。

 res <- do.call(rbind, lapply(split(fake, fake$id), function(x) {
x1 <- x$followup
indx <- cumsum(seq(0, 6, by = 2))
indx1 <- indx[1:which(indx == x1)]
indx2 <- rep(indx1, each = 2)
indx3 <- indx2[-c(1, length(indx2))]
x2 <- do.call(rbind, lapply(split(indx3, (seq_along(indx3) - 1)%/%2 + 1), function(y) data.frame(id = x$id, 
    x = x$x, event = x$event, start.time = y[1], stop.time = y[2])))
if (all(!(!x2$event))) 
    x2$event[-length(x2$event)] <- 0
x2
}))


 row.names(res) <- 1:nrow(res)
 res
 #  id  x event start.time stop.time
 #1  1 42     0          0         2
 #2  1 42     0          2         6
 #3  2 61     0          0         2
 #4  3 50     0          0         2
 #5  3 50     0          2         6
 #6  3 50     1          6        12

按事件时间拆分数据集

1 个答案: