12个月滚动窗口中的积极事件总和

时间:2019-07-23 15:41:29

标签: r dplyr tidyverse zoo rollapply

我正在尝试计算12个月滚动期中的积极事件的数量。

我每年可以创建365行丢失的数据,并使用zoo::rollapply对每365行数据的事件总数求和,但是我的数据框架确实很大,我想对很多变量进行此操作,因此这需要永远运行。

我可以用以下方法获得正确的输出:

data <- data.frame(id = c("a","a","a","a","a","b","b","b","b","b"),
                   date = c("20-01-2011","20-04-2011","20-10-2011","20-02-2012",
                            "20-05-2012","20-01-2013","20-04-2013","20-10-2013",
                            "20-02-2014","20-05-2014"),
                   event = c(0,1,1,1,0,1,0,0,1,1))
library(lubridate)
library(dplyr)
library(tidyr)
library(zoo)

data %>%
group_by(id) %>%
mutate(date = dmy(date),
       cumsum = cumsum(event)) %>%
complete(date = full_seq(date, period = 1), fill = list(event = 0)) %>%
mutate(event12 = rollapplyr(event, width = 365, FUN = sum, partial = TRUE)) %>%
drop_na(cumsum)

这是什么:

 id     date       event cumsum event12
 <fct>  <date>     <dbl>  <dbl>   <dbl>
 a      2011-01-20     0      0       0
 a      2011-04-20     1      1       1
 a      2011-10-20     1      2       2
 a      2012-02-20     1      3       3
 a      2012-05-20     0      3       2
 b      2013-01-20     1      1       1
 b      2013-04-20     0      1       1
 b      2013-10-20     0      1       1
 b      2014-02-20     1      2       1
 b      2014-05-20     1      3       2

但是想看看是否有更有效的方法,例如我将如何使rollyapply中的宽度计算日期而不是计算行数。

1 个答案:

答案 0 :(得分:0)

在将日期转换为Date类后,无需使用复杂的自连接和单个sql语句即可填写缺少的日期:

library(sqldf)

data2 <- transform(data, date = as.Date(date, "%d-%m-%Y"))

sqldf("select a.*, sum(b.event) as event12
  from data2 as a
  left join data2 as b on a.id = b.id and b.date between a.date - 365 and a.date
  group by a.rowid
  order by a.rowid")

给予:

   id       date event event12
1   a 2011-01-20     0       0
2   a 2011-04-20     1       1
3   a 2011-10-20     1       2
4   a 2012-02-20     1       3
5   a 2012-05-20     0       2
6   b 2013-01-20     1       1
7   b 2013-04-20     0       1
8   b 2013-10-20     0       1
9   b 2014-02-20     1       1
10  b 2014-05-20     1       2