Question

我正在尝试创建一个新变量，该变量表示从“ rare_event”到下一个由“ id”分组的“ common_event”所经过的时间。

这是一些代码：

df <- data.frame(id = c(rep("A",10), rep("B",10), rep("C",10)), 
                 time = c(rep(seq(1:10),3)),
                 common_event = c(rep(0, 3), 1, rep(0, 3), 1, rep(0, 5), 1, rep(0, 5), 1, rep(0, 3), 1, rep(0, 2), 1, rep(0, 2), 1),
                 rare_event = c(rep(0, 5), 1, rep(0, 3), 1, rep(0, 6), 1, rep(0, 7), 1, 1, rep(0, 3), 1))

我认为这样会更容易，但是存在一些问题：

并非每个“ id”都具有“ rare_event” == TRUE。
某些“ id”在下一个“ common_event”之前将有多个“ rare_event”，例如第25、26行。

我尝试使用dplyr的for循环以及超前和滞后，但没有成功。

这是预期的结果：

desired <- data.frame(id = c(rep("A",10), rep("B",10), rep("C",10)), 
                 time = c(rep(seq(1:10),3)),
                 common_event = c(rep(0, 3), 1, rep(0, 3), 1, rep(0, 5), 1, rep(0, 5), 1, rep(0, 3), 1, rep(0, 2), 1, rep(0, 2), 1),
                 rare_event = c(rep(0, 5), 1, rep(0, 3), 1, rep(0, 6), 1, rep(0, 7), 1, 1, rep(0, 3), 1),
                 interval = c(rep(0, 5), 2, rep(0, 10), 3, rep(0, 7), 2, 1, rep(0, 4)))

有什么办法解决这个问题吗？

Answer 1

也许有一种更优雅的方法，但这似乎可行。

首先，定义一个函数foo。

foo <- function(time, common, rare) {
  interval <- rep(0, length(time))
  time_comm <- time*common
  for (t in time) {
    if(rare[t]==1) {
      # Find the next time when a common event occurred, if any
      if(any(time>t & common==1)) {
        t_comm <- min(time_comm[time>t & common==1])
        interval[t] <- t_comm - t
      }
      }
  }
  interval
}

然后，将此函数应用于数据帧的每个ID，f：

library(dplyr)
f %>% group_by(id) %>%
  mutate(interval=foo(time, common_event, rare_event))

输出：

   id time common_event rare_event interval
1   A    1            0          0        0
2   A    2            0          0        0
3   A    3            0          0        0
4   A    4            1          0        0
5   A    5            0          0        0
6   A    6            0          1        2
7   A    7            0          0        0
8   A    8            1          0        0
9   A    9            0          0        0
10  A   10            0          1        0

11  B    1            0          0        0
12  B    2            0          0        0
13  B    3            0          0        0
14  B    4            1          0        0
15  B    5            0          0        0
16  B    6            0          0        0
17  B    7            0          1        3
18  B    8            0          0        0
19  B    9            0          0        0
20  B   10            1          0        0

21  C    1            0          0        0
22  C    2            0          0        0
23  C    3            0          0        0
24  C    4            1          0        0
25  C    5            0          1        2
26  C    6            0          1        1
27  C    7            1          0        0
28  C    8            0          0        0
29  C    9            0          0        0
30  C   10            1          1        0

根据条件对非连续行按组计算差异时间

1 个答案: