R-365天移动窗口的滚动总和

时间:2018-08-23 16:53:29

标签: r

我有一个重复测量个体的数据集。每个观察值都有两个日期变量:进入日期和退出日期。如果最后一次退出日期与下次进入日期之间的时间小于或等于30,则将其注册为事件。

我想做的是跟踪一个人在过去365天中再次记录其测量值时发生的事件数。我很难创建这种衰减。

以下是示例数据集,其解决方案不正确:

library(tidyverse)
library(lubridate)

tib_ex <- tibble(
  id = c(1, 1, 1, 1,
         2, 2, 2, 2, 2),
  date_in = ymd(c('2008-07-31', '2008-08-29', '2008-09-15', '2009-05-05', 
                  '2010-08-03', '2010-08-29', '2010-09-25', 
                  '2011-09-11', '2011-12-12')),
  date_out = ymd(c('2008-08-08', '2008-09-01', '2009-03-16', '2009-05-14', 
                   '2010-08-20', '2010-09-01', '2010-11-07',
                   '2011-11-25', '2011-12-16'))

)

tib_ex <- tib_ex %>%
  group_by(id) %>%
  mutate(time_between = as.numeric(date_in - lag(date_out)),
         time_state = as.numeric(date_out - date_in),
         return_30 = ifelse(time_between <= 30, 1, 0), 
         time_between = ifelse(is.na(time_between), 0, time_between),
         return_30 = ifelse(is.na(return_30), 0, return_30), 
         cum_time = cumsum(time_between) + cumsum(time_state))


tib_ex %>%
  group_by(id) %>%
  mutate(count = ifelse(date_in - lag(date_in, 1, default = 0) <= 365,
                        cumsum(return_30), 0))

哪个创建以下小标题:

# A tibble: 9 x 8
# Groups:   id [2]
     id date_in    date_out   time_between time_state return_30 cum_time count
  <dbl> <date>     <date>            <dbl>      <dbl>     <dbl>    <dbl> <dbl>
1     1 2008-07-31 2008-08-08            0          8         0        8     0
2     1 2008-08-29 2008-09-01           21          3         1       32     1
3     1 2008-09-15 2009-03-16           14        182         1      228     2
4     1 2009-05-05 2009-05-14           50          9         0      287     2
5     2 2010-08-03 2010-08-20            0         17         0       17     0
6     2 2010-08-29 2010-09-01            9          3         1       29     1
7     2 2010-09-25 2010-11-07           24         43         1       96     2
8     2 2011-09-11 2011-11-25          308         75         0      479     2
9     2 2011-12-12 2011-12-16           17          4         1      500     3

id 1的观测具有正确的计数,因为累计时间永远不会超过365。对id 2的最后两个观测值具有不正确的计数,因为它们应分别为1和1,而不是2和3(因为衰减)。

1 个答案:

答案 0 :(得分:0)

我在StackExchange上找到了此问题的答案:

R: RunningTotal in the last 365 days window by Name

以下解决方案有效:

tib_ex <- tib_ex %>%
  group_by (id) %>%
  arrange(date_in) %>% 
  mutate(day = date_in - date_in[1])

f <- Vectorize(function(i)
  sum(tib_ex[tib_ex$id[i] == tib_ex$id & tib_ex$day[i] - tib_ex$day >= 0 & 
           tib_ex$day[i] - tib_ex$day <= 365, "return_30"]), vec="i")
tib_ex$RunningTotal365 <- f(1:nrow(tib_ex))

基本上是从上面链接的答案中复制/粘贴。