给出下表:
library(data.table)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))
value grp
1: 3 1
2: 1 1
3: 5 1
4: 6 2
5: 2 2
6: 5 3
7: 12 3
8: 6 3
我想添加3个新列,以使每个列都是“值”列的滚动总和,并按“ grp”列分组。 这是配置表,其中包含以下每个新列的窗口长度和名称:
rolling_conf <- data.table(name=c("2d", "4d", "7d"), window = c(1,2,2))
name window
1: 2d 1
2: 4d 2
3: 7d 2
我能够使用for循环实现此任务:
library(RcppRoll)
for(i in 1:nrow(rolling_conf)){
df[ , rolling_conf$name[i] := roll_sumr(value, rolling_conf$window[i], na.rm=T), grp]
}
这是我得到的输出(这是理想的输出):
value grp 2d 4d 7d
1: 3 1 3 NA NA
2: 1 1 1 4 4
3: 5 1 5 6 6
4: 6 2 6 NA NA
5: 2 2 2 8 8
6: 5 3 5 NA NA
7: 12 3 12 17 17
8: 6 3 6 18 18
我正在寻找一种更快的实现方式(使其并行而不是顺序运行)。我不想使用foreach。 我想应聘者是必经之路,但我没有写出这样的代码。
感谢您的帮助!
答案 0 :(得分:1)
这是我使用lapply
的解决方案:
library(data.table)
library(RcppRoll)
df <- data.table(value = c(3,1,5,6,2,5,12,6), grp = c(1,1,1,2,2,3,3,3))
rolling_conf <- list("2d" = 1, "4d"= 2, "7d" = 2)
dff <- split(df$value, df$grp)
dfl <- lapply(dff, function(y) sapply(rolling_conf, function(x) roll_sumr(y, x, na.rm=T)))
dfl <- do.call(rbind, dfl)
dfl
# 2d 4d 7d
# [1,] 3 NA NA
# [2,] 1 4 4
# [3,] 5 6 6
# [4,] 6 NA NA
# [5,] 2 8 8
# [6,] 5 NA NA
# [7,] 12 17 17
# [8,] 6 18 18
cbind(df,dfl)
# value grp 2d 4d 7d
# 1: 3 1 3 NA NA
# 2: 1 1 1 4 4
# 3: 5 1 5 6 6
# 4: 6 2 6 NA NA
# 5: 2 2 2 8 8
# 6: 5 3 5 NA NA
# 7: 12 3 12 17 17
# 8: 6 3 6 18 18
答案 1 :(得分:1)
使用sapply()
避免手动循环的版本:
library(data.table)
library(RcppRoll)
# create datasets
dt <- data.table(value=c(3,1,5,6,2,5,12,6), grp=c(1,1,1,2,2,3,3,3))
rc <- data.table(name=c("2d", "4d", "7d"), window=c(1,2,2))
# implement rolling sum according various window lengths
result <- sapply(as.list(rc$window), function(x) dt[ , roll_sumr(value, x, na.rm=T), by=grp][[2]])
# add back to dataset with correct column names
colnames(result) <- rc$name
cbind(dt, result)