Question

我的主要目标是使用两组行和列的二进制向量选择非连续的子矩阵。这是我需要为MCMC循环执行的许多步骤之一，我使用Rcpp，RcppArmadillo和RcppEigen在C ++中实现。

三种可能的方法是（1）使用RcppArmadillo，（2）从Rcpp调用我的R函数和（3）直接使用R并将结果传递给C ++。虽然最后一个选项根本不方便我。

然后我比较了这三种情况的性能速度。有趣的是，直接R代码比其他两个快得多！更让我感到惊讶的是，当我从Rcpp调用精确的R函数时，它比我直接从R调用它的速度慢得多。我希望它们的运行速度与本例{{3 }}

无论如何，时间结果对我来说有点奇怪。有什么评论的原因？我使用带有El Capitan OS的Macbook Pro，2.5 Ghz Intel Core i7。它可能与我的系统，Mac OSX或我的机器上安装Rcpp的方式有关吗？

提前致谢！

以下是代码：

CPP部分：

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;

// (1) Using RcppArmadillo functions:
// [[Rcpp::export]]
mat subselect(NumericMatrix X, uvec rows, uvec cols){

  mat XX(X.begin(), X.nrow(),X.ncol(), false);
  mat y = XX.submat(find(rows>0),find(cols>0));
  return (y);
}

// (2) Calling the function from R:
// [[Rcpp::export]]
NumericalMatrix subselect2(NumericMatrix X, NumericVector rows, NumericVector cols){ 

  Environment stats;
  Function submat = stats["submat"];
  NumericMatrix outmat=submat(X,rows,cols);
  return(wrap(outmat));
}

R部分：

library(microbenchmark)

# (3) My R function:
submat <- function(mat,rvec,cvec){
 return(mat[as.logical(rvec),as.logical(cvec)])
}

# Comparing the performances:

// Generating data:
set.seed(432)
rows <- rbinom(1000,1,0.1)
cols <- rbinom(1000,1,0.1)
amat <- matrix(1:1e06,1000,1000)

//benchmarking:
microbenchmark(subselect(amat,rows,cols),
           subselect2(amat,rows,cols),
           submat(amat,rows,cols))

结果：

                         expr     min       lq     mean    median       uq        max   neval   

  subselect(amat, rows, cols) 893.670 1566.882 2297.991  1675.282 2184.783   8462.142     100
 subselect2(amat, rows, cols) 928.418 1581.553 3554.805  1657.454 2060.837 138801.050     100
     submat(amat, rows, cols)  36.313   55.748   66.782    62.709   73.975    136.970     100

Answer 1

这里有一些值得解决的问题。首先，您在基准设计中犯了一个微妙的错误，这对您的犰狳功能subselect的性能产生了重大影响。观察：

set.seed(432)
rows <- rbinom(1000, 1, 0.1)
cols <- rbinom(1000, 1, 0.1)

imat <- matrix(1:1e6, 1000, 1000)
nmat <- imat + 0.0

storage.mode(imat)
# [1] "integer"

storage.mode(nmat)
# [1] "double"

microbenchmark(
    "imat" = subselect(imat, rows, cols),
    "nmat" = subselect(nmat, rows, cols)
)
# Unit: microseconds
#  expr      min       lq      mean    median        uq       max neval
#  imat 3088.140 3218.013 4355.2956 3404.4685 4585.1095 21662.540   100
#  nmat  139.298  167.116  223.2271  209.4585  238.6875   533.035   100

虽然R经常将整数文字（例如1,2,3，...）视为浮点值，但序列运算符:是少数例外情况之一，

storage.mode(c(1, 2, 3, 4, 5))
# [1] "double"

storage.mode(1:5)
# [1] "integer"

这就是表达式matrix(1:1e6, 1000, 1000)返回integer矩阵而不是numeric矩阵的原因。这是有问题的，因为subselect期待NumericMatrix，而不是IntegerMatrix，并且传递后者类型会触发深层复制，因此上面的差异超过一个数量级基准。

其次，R函数submat和C ++函数subselect相对于二进制索引向量分布的相对性能存在显着差异，这可能是由于底层差异导致的算法。对于更稀疏的索引（0比1的比例更大），R函数胜出;对于更密集的索引，情况恰恰相反。这似乎也是矩阵大小（或可能只是维度）的函数，如下图所示，其中行和列索引向量是使用rbinom生成的，成功参数为0.0,0.05,0.10，...... ，0.95,1.0-首先使用1e3 x 1e3矩阵，然后使用1e3 x 1e4矩阵。最后包含了此代码。

基准代码：

library(data.table)
library(microbenchmark)
library(ggplot2)

test_data <- function(nr, nc, p, seed = 123) {
    set.seed(seed)
    list(
        x = matrix(rnorm(nr * nc), nr, nc),
        rv = rbinom(nr, 1, p),
        cv = rbinom(nc, 1, p)
    )
}

tests <- lapply(seq(0, 1, 0.05), function(p) {
    lst <- test_data(1e3, 1e3, p)
    list(
        p = p,
        benchmark = microbenchmark::microbenchmark(
            R = submat(lst[[1]], lst[[2]], lst[[3]]),
            Arma = subselect(lst[[1]], lst[[2]], lst[[3]])
        )
    )
})

gt <- rbindlist(
    Map(function(g) {
        data.table(g[[2]])[
            ,.(Median.us = median(time / 1000)), 
            by = .(Expr = expr)
        ][order(Median.us)][
            ,Relative := Median.us / min(Median.us)
        ][,pSuccess := sprintf("%3.2f", g[[1]])]
    }, tests)
)

ggplot(gt) +
    geom_point(
        aes(
            x = pSuccess, 
            y = Relative, 
            color = Expr
        ),
        size = 2,
        alpha = 0.75
    ) +
    theme_bw() +
    ggtitle("1e3 x 1e3 Matrix")

## change `test_data(1e3, 1e3, p)` to
## `test_data(1e3, 1e4, p)` inside of 
## `tests <- lapply(...) ...` to generate 
## the second plot

比较非连续子矩阵选择的RcppArmadillo和R运行速度

1 个答案: