如果像这样的长格式有多个事件结果(实际数据包含许多ID,则这是简化数据)。
data <- data.frame(id=c(rep(1, 4), rep(2, 3), rep(3, 3)),
event=c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0),
eventcount=c(1, 2, 0, 0, 1, 2, 0, 1, 2, 0),
firstevent=c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0),
time=c(100, 250, 150, 300, 240, 400, 150, 350, 700, 200) )
我想在从第一场比赛开始的特定时间内接听比赛。在这种情况下,我想在100days-150days之内检测到第二个事件。在Stata中,我们可以使用
gen event2=1 if id==id[_n-1]& time-time[_n-1]>100 & time-time[_n-1]<=150 & firstevent[_n-1]==1 & firstevent==0 & event==1
forvalues i = 2/3
{
replace event2=1 if id==id[_n-`i']& time-time[_n-`i']>100 &time-time[_n-`i']<=150 & firstevent[_n-`i']==1 & firstevent==0 & event==1
}
在这种情况下
data_after <- data.frame(id=c(rep(1, 4), rep(2, 3), rep(3, 3)),
event=c(1, 1, 0, 0, 1, 1, 0, 1, 1, 0),
eventcount=c(1, 2, 0, 0, 1, 2, 0, 1, 2, 0),
firstevent=c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0),
time=c(100, 250, 150, 300, 240, 400, 150, 350, 700, 200),
event2=c(NA, 1, NA, NA, NA, NA, NA, NA, NA, NA))
我应该如何用R写这个?
答案 0 :(得分:0)
intervals = ave(
data$time,
data$id,
FUN = function(x)
c(0, diff(x))
)
intervals
# [1] 0 150 -100 150 0 160 -250 0 350 -500
meets_duration_requirement = ave(
intervals,
data$id,
FUN = function(x)
x >= 100 & x <= 150
) == 1 & data$event == 1
choose_second = meets_duration_requirement == 1 &
ave(meets_duration_requirement, data$id, FUN = seq_along) == 2 #if you want third event, change this to 3
replace(x = rep(NA, NROW(data)),
list = choose_second,
1)
# [1] NA 1 NA NA NA NA NA NA NA NA