Question

我试图为矩阵的每一列获取累积总和。这是我在R中的代码：

testMatrix = matrix(1:65536, ncol=256);
microbenchmark(apply(testMatrix, 2, cumsum), times=100L);

Unit: milliseconds
                         expr      min       lq     mean  median       uq      max neval
 apply(testMatrix, 2, cumsum) 1.599051 1.766112 2.329932 2.15326 2.221538 93.84911 10000

我用Rcpp进行比较：

cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            m(i, j) += m(i - 1, j);
        }
    }
    return m;
}');
microbenchmark(apply_cumsum_col(testMatrix), times=10000L);

Unit: microseconds
                         expr     min      lq     mean  median      uq      max neval
 apply_cumsum_col(testMatrix) 205.833 257.719 309.9949 265.986 276.534 96398.93 10000

因此C ++代码的速度是原来的7.5倍。在纯R中，有可能比apply(testMatrix, 2, cumsum)更好吗？感觉就像没有任何理由我有一个数量级的开销。

Answer 1

用R代码很难击败C ++。我能想到的最快的方法就是你愿意将你的矩阵分成一个列表。这样，R使用原始函数，并且不会在每次迭代时复制对象（apply本质上是一个漂亮的循环）。您可以看到C ++仍然胜出，但如果您真的只想使用R代码，那么使用list方法会有显着的加速。

fun1 <- function(){
    apply(testMatrix, 2, cumsum)
}

testList <- split(testMatrix, col(testMatrix))

fun2 <- function(){
    lapply(testList, cumsum)
}

microbenchmark(fun1(),
               fun2(),
               apply_cumsum_col(testMatrix),
               times=100L)


Unit: microseconds
                         expr      min        lq      mean   median        uq      max neval
                       fun1() 3298.534 3411.9910 4376.4544 3477.608 3699.2485 9249.919   100
                       fun2()  558.800  596.0605  766.2377  630.841  659.3015 5153.100   100
 apply_cumsum_col(testMatrix)  219.651  282.8570  576.9958  311.562  339.5680 4915.290   100

修改请注意，如果您包含将矩阵拆分为列表的时间，则此方法比fun1慢。

Answer 2

使用字节编译的for循环比我系统上的apply调用稍快。我预计它会更快，因为它的工作量比apply少。正如预期的那样，R循环仍然比你编写的简单C ++函数慢。

colCumsum <- compiler::cmpfun(function(x) {
  for (i in 1:ncol(x))
    x[,i] <- cumsum(x[,i])
  x
})

testMatrix <- matrix(1:65536, ncol=256)
m <- testMatrix
require(microbenchmark)
microbenchmark(colCumsum(m), apply_cumsum_col(m), apply(m, 2, cumsum), times=100L)
# Unit: microseconds
#                 expr      min        lq    median        uq       max neval
#      matrixCumsum(m) 1478.671 1540.5945 1586.1185 2199.9530 37377.114   100
#  apply_cumsum_col(m)  178.214  192.4375  204.3905  234.8245  1616.030   100
#  apply(m, 2, cumsum) 1879.850 1940.1615 1991.3125 2745.8975  4346.802   100
all.equal(colCumsum(m), apply(m, 2, cumsum))
# [1] TRUE

Answer 3

也许已经晚了，但我会写下我的答案，以便其他任何人都可以看到。

首先，在您的C ++代码中，您需要克隆矩阵，否则您将被写入R的内存中，并且CRAN禁止这样做。因此您的代码变为：

rcpp_apply<-cppFunction('NumericMatrix apply_cumsum_col(NumericMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

由于矩阵为typeof integer，因此可以将C ++的参数更改为IntegerMatrix。

rcpp_apply_integer<-cppFunction('IntegerMatrix apply_cumsum_col(IntegerMatrix m) {
    NumericMatrix g=clone(m);
    for (int j = 0; j < m.ncol(); ++j) {
        for (int i = 1; i < m.nrow(); ++i) {
            g(i, j) += g(i - 1, j);
        }
    }
    return g;
}');

这使代码改进了大约2次。这是一个基准：

microbenchmark::microbenchmark(R=apply(testMatrix, 2, cumsum),Rcpp=rcpp_apply(testMatrix),Rcpp_integer=rcpp_apply_integer(testMatrix), times=10)

Unit: microseconds
        expr      min       lq      mean    median       uq      max neval
           R 1552.217 1706.165 1770.1264 1740.0345 1897.884 1940.989    10
        Rcpp  502.900  523.838  637.7188  665.0605  699.134  743.471    10
Rcpp_integer  220.455  274.645  274.9327  275.8770  277.930  316.109    10



all.equal(rcpp_apply(testMatrix),rcpp_apply_integer(testMatrix))
[1] TRUE

如果矩阵的值较大，则必须使用NumericMatrix。

使累积总和更快

3 个答案: