R滚动汇总,带有多个窗口,按组

时间:2018-07-21 15:48:00

标签: r

给出下表:

library(data.table)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))

   value grp
1:     3   1
2:     1   1
3:     5   1
4:     6   2
5:     2   2
6:     5   3
7:    12   3
8:     6   3

我想添加3个新列,以使每个列都是“值”列的滚动总和,并按“ grp”列分组。 这是配置表,其中包含以下每个新列的窗口长度和名称:

rolling_conf <- data.table(name=c("2d", "4d", "7d"), window = c(1,2,2))

   name window
1:   2d      1
2:   4d      2
3:   7d      2

我能够使用for循环实现此任务:

library(RcppRoll)
for(i in 1:nrow(rolling_conf)){
  df[ , rolling_conf$name[i] := roll_sumr(value, rolling_conf$window[i], na.rm=T), grp]
}

这是我得到的输出(这是理想的输出):

   value grp 2d 4d 7d
1:     3   1  3 NA NA
2:     1   1  1  4  4
3:     5   1  5  6  6
4:     6   2  6 NA NA
5:     2   2  2  8  8
6:     5   3  5 NA NA
7:    12   3 12 17 17
8:     6   3  6 18 18

我正在寻找一种更快的实现方式(使其并行而不是顺序运行)。我不想使用foreach。 我想应聘者是必经之路,但我没有写出这样的代码。

感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

这是我使用lapply的解决方案:

library(data.table)
library(RcppRoll)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))
rolling_conf <- list("2d" = 1, "4d"= 2, "7d" = 2)
dff <- split(df$value, df$grp)

dfl <- lapply(dff, function(y) sapply(rolling_conf, function(x) roll_sumr(y, x, na.rm=T)))


dfl <- do.call(rbind, dfl)
dfl
#      2d 4d 7d
# [1,]  3 NA NA
# [2,]  1  4  4
# [3,]  5  6  6
# [4,]  6 NA NA
# [5,]  2  8  8
# [6,]  5 NA NA
# [7,] 12 17 17
# [8,]  6 18 18


cbind(df,dfl)
#    value grp 2d 4d 7d
# 1:     3   1  3 NA NA
# 2:     1   1  1  4  4
# 3:     5   1  5  6  6
# 4:     6   2  6 NA NA
# 5:     2   2  2  8  8
# 6:     5   3  5 NA NA
# 7:    12   3 12 17 17
# 8:     6   3  6 18 18

答案 1 :(得分:1)

使用sapply()避免手动循环的版本:

library(data.table)
library(RcppRoll)

# create datasets
dt <- data.table(value=c(3,1,5,6,2,5,12,6), grp=c(1,1,1,2,2,3,3,3))
rc <- data.table(name=c("2d", "4d", "7d"), window=c(1,2,2))

# implement rolling sum according various window lengths
result <- sapply(as.list(rc$window), function(x) dt[ , roll_sumr(value, x, na.rm=T), by=grp][[2]])

# add back to dataset with correct column names
colnames(result) <- rc$name
cbind(dt, result)