Question

我知道矩阵乘法的结果是对称的。是否有R包或一些标准方法，通过计算下半部/上半部三角形然后将结果复制到另一半，我可以加快计算速度。

我知道Type Decays to ———— ————————— T [N] T * T [M][N] T (*)[N] T [L][M][N] T (*)[M][N]只有一个参数提供但我想提供两个矩阵时会从这个事实中受益。

以下是结果对称的示例：

tcrossprod

n <- 100 m <- 200 s<-matrix(runif(n^2),n,n) s[lower.tri(s)] <- t(s)[lower.tri(s)] x <- matrix(runif(m*n), m, n) x %*% s %*% t(x)似乎不是解决方案：

tcrossprod

我试图使用Rcpp，即使没有复制步骤，这也比R的乘法慢（尽管我自由地承认我是初学者c ++ / Rcpp用户）：

library(microbenchmark)
microbenchmark(x %*% s %*% t(x), tcrossprod(x %*% s, x))

我认为如果.Internal function w <- s %*% t(x) mm = Rcpp::cppFunction( 'NumericMatrix mmult(NumericMatrix m , NumericMatrix v) { NumericMatrix out(m.nrow(), v.ncol()); for (int i = 0; i < m.nrow(); i++) { for (int j = 0; j < i + 1; j++) { for(int k = 0; k < m.ncol(); k++){ out(i,j) += m(i,k) * v(k,j) ; } } } return out; }' ) microbenchmark(mm(x, w), x %*% w)中的sym变量已公开，并且可由用户设置为true，则可以解决此问题。但是，我真的不想搞砸这样的事情......

Answer 1

matrix包看起来并不像对称性那样：

> n <- 100
> x <- s <- matrix(runif(n^2),n,n)
> s[lower.tri(s)] <- t(s)[lower.tri(s)]
> 
> library(Matrix)
> s_sym <- Matrix(forceSymmetric(s))
> class(s_sym) # has the symmetric class
[1] "dsyMatrix"
attr(,"package")
[1] "Matrix"
> 
> library(microbenchmark)
> microbenchmark(x %*% x, s %*% s, s_sym %*% s_sym)
Unit: microseconds
            expr min  lq mean median  uq  max neval
         x %*% x 461 496  571    528 625 1008   100
         s %*% s 461 499  560    532 572  986   100
 s_sym %*% s_sym 553 568  667    624 701 1117   100

在帮助文件中没有任何迹象表明：

基本矩阵产品%*%适用于我们所有的Matrix和同样适用于sparseVector类，完全类似于R的基本矩阵和矢量对象。函数crossprod和tcrossprod是矩阵产品或“交叉产品”，理想情况下无需有效实施不必要地计算t(.)。他们还会返回symmetricMatrix 例如，在crossprod(m)中，可以容易地检测到的分类矩阵一个论证案例。 tcrossprod()取得了...的交叉积转置矩阵。 tcrossprod(x)正式等同于，但是比来电x %*% t(x)更快，而tcrossprod(x, y)则更快 x %*% t(y)。

您的解决方案是使用Rcpp和R_ext/BLAS.h中提供的BLAS函数创建包装函数。您可以按如下方式执行此操作：像这样创建一个func.cpp：

// added to get $(BLAS_LIBS) in compile flags
//[[Rcpp::depends(RcppArmadillo)]]
#include <Rcpp.h>
#include <R_ext/BLAS.h>

/*
  Wrapper for BLAS dsymm. See dsymm http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_ga253c8edb8b21d1b5b1783725c2a6b692.html#ga253c8edb8b21d1b5b1783725c2a6b692
  Only works with side = 'R'
  Note intput is by refernce with &
*/
// [[Rcpp::export]]
Rcpp::NumericMatrix blas_dsymm(
    char uplo, int m, int n, double alpha,
    const Rcpp::NumericMatrix &A, const Rcpp::NumericMatrix &B){
  // set lda, ldb and ldc
  int lda = n, ldb = m, ldc = m;

  // make new matrix with dim(m, n)
  Rcpp::NumericMatrix C(m, n); // default values are zero
  double beta = 0;

  F77_NAME(dsymm)(
    "R" /* side */, &uplo, &m, &n, &alpha, 
    A.begin(), &lda, B.begin(), &ldb, &beta, C.begin(), &ldc);

  return(C);
}

然后运行以下R脚本：

> n <- 100
> m <- 200
> s<-matrix(runif(n^2),n,n)
> s[lower.tri(s)] <- t(s)[lower.tri(s)]
> x <- matrix(runif(m*n), m, n)
> 
> library("Rcpp")
> sourceCpp("func.cpp")
> 
> out <- x %*% s
> out_blas <- blas_dsymm(
+   uplo = "U", m = nrow(x), n = ncol(x), 
+   alpha = 1, A = s, B = x)
> 
> all.equal(out, out_blas)
[1] TRUE
> 
> library(microbenchmark)
> microbenchmark(
+   dense = x %*% s,
+   BLAS = blas_dsymm(
+     uplo = "U", m = nrow(x), n = ncol(x), 
+     alpha = 1, A = s, B = x))
Unit: microseconds
  expr     min       lq     mean   median       uq      max neval
 dense 880.989 950.3225 1114.744 1066.866 1159.311 2783.213   100
  BLAS 858.866 938.6680 1169.839 1016.495 1225.286 3261.633   100

这似乎没有任何区别。请注意，您需要安装RcppArmadillo和Rcpp软件包。

Answer 2

不要使用for循环重新编码矩阵乘法。线性代数库对此进行了高度优化，您可能会慢10倍（或更差）。

对于矩阵计算，使用RcppArmadillo或RcppEigen不会获得太多（或松散）。

如果您想获得，可以更改您正在使用的数学库，例如使用带有Microsoft R Open的MKL。

当已知结果是对称的时，加速矩阵乘法

2 个答案: