我有一个重复测量个体的数据集。每个观察值都有两个日期变量:进入日期和退出日期。如果最后一次退出日期与下次进入日期之间的时间小于或等于30,则将其注册为事件。
我想做的是跟踪一个人在过去365天中再次记录其测量值时发生的事件数。我很难创建这种衰减。
以下是示例数据集,其解决方案不正确:
library(tidyverse)
library(lubridate)
tib_ex <- tibble(
id = c(1, 1, 1, 1,
2, 2, 2, 2, 2),
date_in = ymd(c('2008-07-31', '2008-08-29', '2008-09-15', '2009-05-05',
'2010-08-03', '2010-08-29', '2010-09-25',
'2011-09-11', '2011-12-12')),
date_out = ymd(c('2008-08-08', '2008-09-01', '2009-03-16', '2009-05-14',
'2010-08-20', '2010-09-01', '2010-11-07',
'2011-11-25', '2011-12-16'))
)
tib_ex <- tib_ex %>%
group_by(id) %>%
mutate(time_between = as.numeric(date_in - lag(date_out)),
time_state = as.numeric(date_out - date_in),
return_30 = ifelse(time_between <= 30, 1, 0),
time_between = ifelse(is.na(time_between), 0, time_between),
return_30 = ifelse(is.na(return_30), 0, return_30),
cum_time = cumsum(time_between) + cumsum(time_state))
tib_ex %>%
group_by(id) %>%
mutate(count = ifelse(date_in - lag(date_in, 1, default = 0) <= 365,
cumsum(return_30), 0))
哪个创建以下小标题:
# A tibble: 9 x 8
# Groups: id [2]
id date_in date_out time_between time_state return_30 cum_time count
<dbl> <date> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2008-07-31 2008-08-08 0 8 0 8 0
2 1 2008-08-29 2008-09-01 21 3 1 32 1
3 1 2008-09-15 2009-03-16 14 182 1 228 2
4 1 2009-05-05 2009-05-14 50 9 0 287 2
5 2 2010-08-03 2010-08-20 0 17 0 17 0
6 2 2010-08-29 2010-09-01 9 3 1 29 1
7 2 2010-09-25 2010-11-07 24 43 1 96 2
8 2 2011-09-11 2011-11-25 308 75 0 479 2
9 2 2011-12-12 2011-12-16 17 4 1 500 3
对id
1的观测具有正确的计数,因为累计时间永远不会超过365。对id
2的最后两个观测值具有不正确的计数,因为它们应分别为1和1,而不是2和3(因为衰减)。
答案 0 :(得分:0)
我在StackExchange上找到了此问题的答案:
R: RunningTotal in the last 365 days window by Name
以下解决方案有效:
tib_ex <- tib_ex %>%
group_by (id) %>%
arrange(date_in) %>%
mutate(day = date_in - date_in[1])
f <- Vectorize(function(i)
sum(tib_ex[tib_ex$id[i] == tib_ex$id & tib_ex$day[i] - tib_ex$day >= 0 &
tib_ex$day[i] - tib_ex$day <= 365, "return_30"]), vec="i")
tib_ex$RunningTotal365 <- f(1:nrow(tib_ex))
基本上是从上面链接的答案中复制/粘贴。