我正在尝试创建一个新变量,该变量表示从“ rare_event”到下一个由“ id”分组的“ common_event”所经过的时间。
这是一些代码:
df <- data.frame(id = c(rep("A",10), rep("B",10), rep("C",10)),
time = c(rep(seq(1:10),3)),
common_event = c(rep(0, 3), 1, rep(0, 3), 1, rep(0, 5), 1, rep(0, 5), 1, rep(0, 3), 1, rep(0, 2), 1, rep(0, 2), 1),
rare_event = c(rep(0, 5), 1, rep(0, 3), 1, rep(0, 6), 1, rep(0, 7), 1, 1, rep(0, 3), 1))
我认为这样会更容易,但是存在一些问题:
我尝试使用dplyr的for循环以及超前和滞后,但没有成功。
这是预期的结果:
desired <- data.frame(id = c(rep("A",10), rep("B",10), rep("C",10)),
time = c(rep(seq(1:10),3)),
common_event = c(rep(0, 3), 1, rep(0, 3), 1, rep(0, 5), 1, rep(0, 5), 1, rep(0, 3), 1, rep(0, 2), 1, rep(0, 2), 1),
rare_event = c(rep(0, 5), 1, rep(0, 3), 1, rep(0, 6), 1, rep(0, 7), 1, 1, rep(0, 3), 1),
interval = c(rep(0, 5), 2, rep(0, 10), 3, rep(0, 7), 2, 1, rep(0, 4)))
有什么办法解决这个问题吗?
答案 0 :(得分:0)
也许有一种更优雅的方法,但这似乎可行。
首先,定义一个函数foo。
foo <- function(time, common, rare) {
interval <- rep(0, length(time))
time_comm <- time*common
for (t in time) {
if(rare[t]==1) {
# Find the next time when a common event occurred, if any
if(any(time>t & common==1)) {
t_comm <- min(time_comm[time>t & common==1])
interval[t] <- t_comm - t
}
}
}
interval
}
然后,将此函数应用于数据帧的每个ID,f:
library(dplyr)
f %>% group_by(id) %>%
mutate(interval=foo(time, common_event, rare_event))
输出:
id time common_event rare_event interval
1 A 1 0 0 0
2 A 2 0 0 0
3 A 3 0 0 0
4 A 4 1 0 0
5 A 5 0 0 0
6 A 6 0 1 2
7 A 7 0 0 0
8 A 8 1 0 0
9 A 9 0 0 0
10 A 10 0 1 0
11 B 1 0 0 0
12 B 2 0 0 0
13 B 3 0 0 0
14 B 4 1 0 0
15 B 5 0 0 0
16 B 6 0 0 0
17 B 7 0 1 3
18 B 8 0 0 0
19 B 9 0 0 0
20 B 10 1 0 0
21 C 1 0 0 0
22 C 2 0 0 0
23 C 3 0 0 0
24 C 4 1 0 0
25 C 5 0 1 2
26 C 6 0 1 1
27 C 7 1 0 0
28 C 8 0 0 0
29 C 9 0 0 0
30 C 10 1 1 0