Question

我试图在基数R中有效地执行以下条件累积和，但我正在努力在运行中访问先前计算的元素。带有for循环的代码：

input <- c(6, 4, 8, 2, 2, 4, 2, 6)    
indx <- c(1, 1, 2, 2, 4, 3, 4, 5)
desired_out <- rep(0, length(input))
for (i in seq_along(desired_out)) {
    print(desired_out[i] <- desired_out[indx[i]] + input[i])
}
# [1] 6
# [1] 10
# [1] 18
# [1] 12
# [1] 14
# [1] 22
# [1] 14
# [1] 20

所需的输出是向量c(6, 10, 18, 12, 14, 22, 14, 20)。它类似于条件累积和，因为您可以通过执行cumsum(input)[indx] + input来获得结果。

Answer 1

如果速度最受关注，OP的代码可以使用Rcpp轻松转换为C ++代码，如下所示：

示例数据：

library(data.table)
set.seed(0L)
M <- 1e6
ngrps <- 1e3
DT <- data.table(input=sample(10, M, replace=TRUE),
    indx=sort(sample(ngrps, M, replace=TRUE)))

# DT <- data.table(input=c(6, 4, 8, 2, 2),    
#         indx=c(1, 1, 2, 2, 4))

cpp代码：

library(Rcpp)
system.time(
    cppFunction(
    "NumericVector func(NumericVector input, NumericVector indx) {
        const int len = input.size();
        NumericVector ret(len, 0.0);
        for (int k=0; k<len; k++) {
            ret[k] = ret[indx[k]-1] + input[k];
        }
        return ret;
    }")
)
#  user  system elapsed 
#  0.04    0.05    6.64

请记住，C ++代码使用从零开始的索引，因此需要indx[k]-1。

检查OP的例子：

input <- c(6, 4, 8, 2, 2, 4, 2, 6)    
indx <- c(1, 1, 2, 2, 4, 3, 4, 5)
func(input, indx)
#[1]  6 10 18 12 14 22 14 20

使用data.table语法进行计时和示例调用：

system.time(DT[, func(input, indx)])
#  user  system elapsed 
#  0.00    0.01    0.02

与R loop的速度比较

M <- 1e6
ngrps <- 1e3
input <- sample(10, M, replace=TRUE),
indx <- sort(sample(ngrps, M, replace=TRUE)))
microbenchmark(
  rcpp = func(input, indx),
  Rloop = {
    desired_out <- rep(0, length(input))
    for (i in seq_along(desired_out)) {
      desired_out[i] <- desired_out[indx[i]] + input[i]
    }},
  unit = 'relative',
  times = 100)

# Unit: relative
# expr       min       lq     mean   median       uq       max neval
# rcpp   1.00000  1.00000 1.000000  1.00000 1.000000 1.0000000   100
# Rloop 14.80781 11.37963 6.712257 10.44288 6.244126 0.7554706   100

Answer 2

sapply应该更快

sapply(1:length(input), function(i){

  desired_out[i]<<-desired_out[indx[i]] + input[i]

})
[1]  6 10 18 12 14 22 14 20

如何做有条件的累积和，需要动态访问以前计算的元素？

2 个答案: