rcppRoll n =月数,而不是obs

时间:2018-11-27 12:34:52

标签: r data.table

我遇到了rcppRoll软件包的问题。我想用它来总结过去3个月的价值,但是,有时没有1个月或更长时间的数据。 “ n = 3”考虑的是最近三个观察,而不是最近三个月。我找不到可靠的解决方案,因此我在这里尝试碰运气。预先感谢您的任何建议。

P.S。我更喜欢使用data.table和rcpp_roll,因为我的数据集很大,我对此很熟悉。

代码:

library("data.table")
library("RcppRoll")

test = data.table(id = rep(1, 8),date = c("2015-01","2015-02","2015-03","2015-04","2015-08","2015-09","2015-10","2015-11"), value = 1:8)
test = test[, var:= roll_sumr(value, n = 3, na.rm = TRUE), by = id]

   id    date value var
1:  1 2015-01     1  NA
2:  1 2015-02     2  NA
3:  1 2015-03     3   6
4:  1 2015-04     4   9
5:  1 2015-08     5  12
6:  1 2015-09     6  15
7:  1 2015-10     7  18
8:  1 2015-11     8  21

预期产量

prefered_outcome = data.table(id = rep(1, 8),date = c("2015-01","2015-02","2015-03","2015-04","2015-08","2015-09","2015-10","2015-11"), value = 1:8,var = c(NA, NA, 6, 9, NA, NA, 18, 21))
   id    date value var
1:  1 2015-01     1  NA
2:  1 2015-02     2  NA
3:  1 2015-03     3   6
4:  1 2015-04     4   9
5:  1 2015-08     5  NA
6:  1 2015-09     6  NA
7:  1 2015-10     7  18
8:  1 2015-11     8  21

2 个答案:

答案 0 :(得分:1)

定义yearmon类的ym,并检查前一个ym和第二个roll_sumr是否早了一个月和两个月,如果是,则使用library(zoo) ym <- test[, as.yearmon(date)] test[, roll := ifelse(ym - 1/12 == shift(ym) & ym - 2/12 == shift(ym, 2), roll_sumr(value, 3, na.rm = TRUE), NA), by = id ] ,否则使用NA。

> test
   id    date value roll
1:  1 2015-01     1   NA
2:  1 2015-02     2   NA
3:  1 2015-03     3    6
4:  1 2015-04     4    9
5:  1 2015-08     5   NA
6:  1 2015-09     6   NA
7:  1 2015-10     7   18
8:  1 2015-11     8   21

给予:

getText

答案 1 :(得分:0)

您可以先添加缺少的月份,然后再执行功能。之后,添加的月份可以再次删除

library(data.table)
library("RcppRoll")
library(zoo)
test = data.table(id = rep(1, 8),date = c("2015-01","2015-02","2015-03","2015-04","2015-08","2015-09","2015-10","2015-11"), value = 1:8)
test$date <- as.yearmon(test$date)
allMonths <- seq.Date(from=as.Date(test$date[1]),to=as.Date(test$date[length(test$date)]),by="month")
df2 <- data.frame(date=as.yearmon(allMonths))
df3 <- merge(test,df2, all=TRUE)
df3 <- df3[, var:= roll_sumr(value, n = 3, na.rm = TRUE), by = id]
df3