我有一个带有时间序列的data.table
,并尝试在重叠的时间间隔上计算多个聚合,例如在2月,我希望对1月和2月,3月-2月和3月等的数据取平均值。
我能够使用for循环来计算此值,但是由于我的data.table
包含30万行和多个变量,我想知道是否有更有效/更优雅的方法来实现这一点。我尝试以各种方式使用rollapply
包中的zoo
,但没有得到预期的结果。
library(data.table)
library(zoo)
# sample data
dt <- data.table(day = Sys.Date() - 100:1, var = 1:100)
dt[, month := month(day)]
# by 1 month is pretty obvious
dt[, mean(var), by = month]
month V1
1: 7 1.5
2: 8 18.0
3: 9 48.5
4: 10 79.0
5: 11 97.5
# by 2 months - solution using for loop = expected result
for (m in unique(dt[, month])[-1]) {
dt[month == m, res := mean(dt[month %in% c(m, m-1), var])]
}
dt[, unique(res), by = month]
month V1
1: 7 NA
2: 8 17
3: 9 33
4: 10 64
5: 11 82
# one of the things I tried
dt[, res := NULL]
lw <- dt[, .N, by = month][, N]
lw <- as.list(lw[-1] + lw[-length(lw)])
dt[, rollapplyr(var, width = lw, mean, fill = NA), by = month]