在上一篇文章Large SpMat object with RcppArmadillo中,我决定使用Rcpp
来计算一个大矩阵(~600,000行x 11列)
我已安装Rcpp
和RcppArmadillo
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RcppArmadillo_0.7.500.0.0 Rcpp_0.12.7 cluster_2.0.4 skmeans_0.2-8
[5] ggdendro_0.1-20 ggplot2_2.1.0 lsa_0.73.1 SnowballC_0.5.1
[9] data.table_1.9.6 jsonlite_1.1 purrr_0.2.2 stringi_1.1.2
[13] dplyr_0.5.0 plyr_1.8.4
loaded via a namespace (and not attached):
[1] assertthat_0.1 slam_0.1-38 MASS_7.3-45 chron_2.3-47 grid_3.3.1 R6_2.2.0 gtable_0.2.0
[8] DBI_0.5-1 magrittr_1.5 scales_0.4.0 tools_3.3.1 munsell_0.4.3 clue_0.3-51 colorspace_1.2-7
[15] tibble_1.2
使用mtcars
之类的例子,这非常有效:
library(lsa)
x <- as.matrix(mtcars)
cosine(t(x))
这是来自cosine
的{{1}}函数:
lsa
cosR <- function(x) {
co <- array(0, c(ncol(x), ncol(x)))
## f <- colnames(x)
## dimnames(co) <- list(f, f)
for (i in 2:ncol(x)) {
for (j in 1:(i - 1)) {
co[i,j] <- crossprod(x[,i], x[,j])/
sqrt(crossprod(x[,i]) * crossprod(x[,j]))
}
}
co <- co + t(co)
diag(co) <- 1
return(as.matrix(co))
}
中的等价物就是:
Rcpp
您可以检查两个功能是否等效
library(Rcpp)
library(RcppArmadillo)
cppFunction(depends='RcppArmadillo',
code="NumericMatrix cosCpp(NumericMatrix Xr) {
int n = Xr.nrow(), k = Xr.ncol();
arma::mat X(Xr.begin(), n, k, false); // reuses memory and avoids extra copy
arma::mat Y = arma::trans(X) * X; // matrix product
arma::mat res = Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))));
return Rcpp::wrap(res);
}")
但是当我在加载all.equal(cosCpp(x),cosR(x))
[1] TRUE
后运行我的数据时,我获得了:
Rcpp
我将我的功能修改为:
x <- as.matrix(my_data)
x <- t(my_data)
y <- cosCpp(x)
error: Mat::init(): requested size is too large
Error in eval(substitute(expr), envir, enclos) :
Mat::init(): requested size is too large
且sourceCpp("/myfolder/my_function.cpp")
的内容为
my_function.cpp
然后我跑
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::export]]
arma::sp_mat cosine_rcpp(
const arma::mat & X
) {
int k = X.n_cols;
arma::sp_mat ans(k,k);
for (int i=0;i<k;i++)
for (int j=i;j<k;j++) {
// X(i) x X(j)' / sqrt(sum(X^2) * sum(Y^2))
ans.at(i,j) = arma::norm_dot(X.col(i), X.col(j));
}
return ans;
}
答案 0 :(得分:7)
RcppArmadillo
目录中的内容,Rcpp
是一个/src
唯一的包。要启用C ++ 11,请使用// [[Rcpp::plugins(cpp11)]]
ARMA_64BIT_WORD
未定义。要定义它,请在#define ARMA_64BIT_WORD 1
之前添加#include <RcppArmadillo.h>
。使用sourceCpp()
#define ARMA_64BIT_WORD 1
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
arma::mat cosCpp(const arma::mat& X) {
arma::mat Y = arma::trans(X) * X; // matrix product
arma::mat res = Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))));
return res;
}
要在/src/Makevars{.win}
中为包使用定义它:
PKG_CPPFLAGS = -DARMA_64BIT_WORD=1