我有以下函数(funtest)来测试矩阵中是否存在特定的向量。向量将始终为长度2,矩阵将始终具有两列。该函数工作正常,我只想让它更快(理想情况下更快),因为我的矩阵可以有数百到数千行。
x = c(1,2)
set.seed(100)
m <- matrix(sample(c(1,-2,3,4), 500*2, replace=TRUE), ncol=2)
funtest(m,x)
[1] TRUE
这是目前的速度
library(microbenchmark)
microbenchmark(funtest(m, x), times=100)
Unit: milliseconds
expr min lq mean median uq max
funtest(m, x) 1.501247 1.536157 1.674668 1.567826 1.708293 2.900046
neval
100
这是功能
funtest = function(m, x) {
out = any(apply(m,1,function(n,x) all(n==x),x=x))
return(out)
}
答案 0 :(得分:3)
怎么样
paste(x[1], x[2], sep='&') %in% paste(m[,1], m[,2], sep='&')
这应该是超级高效的!它基于匹配。一旦找到第一场比赛,就不会再进行搜索了!
但我确信这不是最快的。最佳解决方案是使用单个while循环在C代码中编写此操作。但是,潜在的加速因子不应超过2。
答案 1 :(得分:3)
这是一个Rcpp(特别是Rcpp Armadillo)的方法。基准在最后给出:
Haskell
基准在这里:(编辑:我已经为@ zheyuan-li添加了一个非常简单的解决方案的基准;它被称为pasteFn)
# Import the relevant packages (All for compiling the C++ code inline)
library(Rcpp)
library(RcppArmadillo)
library(inline)
# We need to include these namespaces in the C++ code
includes <- '
using namespace Rcpp;
using namespace arma;
'
# This is the main C++ function
# We cast 'm' as an Armadillo matrix 'm1' and compute the number of rows 'numRows'
# We cast 'x' as a row vector 'x1'
# We then loop through the rows of the matrix
# As soon as we find a matching row (anyEqual = TRUE), we stop and return TRUE
# If no matching row is found, then anyEqual = FALSE and we return FALSE
# Note: Within the for loop, we do an elementwise comparison of a row of m1 to x1
# If the row is equal to x1, then the sum of the elementwise comparision should equal the number of elements of x1
src <- '
mat m1 = as<mat>(m);
int numRows = m1.n_rows;
rowvec x1 = as<rowvec>(x);
bool anyEqual = FALSE;
for (int i = 0; i < numRows & !anyEqual; i++){
anyEqual = (sum(m1.row(i) == x1) == x1.size());
}
return(wrap(anyEqual));
'
# Here, we compile the function above
# Do this once (in a given R session) and use it as many times as desired
rcppFn <- cxxfunction(signature(m="numeric", x="numeric"), src, plugin='RcppArmadillo', includes)
编辑:如果您想使用矩阵&#39; x&#39;相反,以下源代码应该工作
# Your function is called funtest
# Rcpp function is rcppFn
# Zheyuan's solution is pasteFn
microbenchmark(funtest(m, x), rcppFn(m, x), pasteFn(m, x), times=100, unit = "ms")
Unit: milliseconds
expr min lq mean median uq max neval
funtest(m, x) 1.127903 1.1984755 1.30559130 1.2514455 1.3431040 2.641258 100
rcppFn(m, x) 0.005420 0.0061355 0.00879676 0.0073660 0.0084130 0.030305 100
pasteFn(m, x) 0.741269 0.7610905 0.79174042 0.7752145 0.8228895 0.894389 100
这里,我只是检查x的每一行,是否存在于m中。与原始代码非常相似,只是有一个额外的for循环。它将返回1或0,具体取决于是否匹配(没有足够的经验与RcppArmadillo创建一个bool矢量)。
答案 2 :(得分:3)
base::bitwXor()
将为两个整数之间的匹配生成0
。
注意: bitwXor()
仅适用于整数
编辑:添加了与0
的{{1}}的比较,并添加了data.table解决方案
bitwXor
Data.Table解决方案:
library(microbenchmark)
set.seed(100)
m <- matrix(sample(c(1,-2,3,4), 500*2, replace=TRUE), ncol=2)
fun1 <- function(m,x) {any(apply(m,1,function(n,x) all(n==x),x=x))}
fun2 <- function(m,x) {paste(x[1], x[2], sep='&') %in% paste(m[,1], m[,2], sep='&')}
fun3 <- function(m,x) {any((bitwXor(m[,1], x[1]) == 0) & (bitwXor(m[,2], x[2]) == 0))}
fun4 <- function(m,x) {setDT(m)[X1 == x[1] & X2 == x[2], .N > 0]}
x <- c(1,2)
microbenchmark(fun1(m,x), # @user3067923
fun2(m,x), # @Zheyuan Li
rcppFn(m, x), # @jav
fun3(m,x),
times = 1000)
# Unit: microseconds
# expr min lq mean median uq max neval
# fun1(m, x) 1802.483 1920.007 2156.93459 1995.865 2094.820 9915.013 1000
# fun2(m, x) 1540.716 1602.534 1674.39556 1641.256 1702.848 2832.344 1000
# rcppFn(m, x) 14.040 16.305 23.43586 21.739 29.439 95.107 1000
# fun3(m, x) 70.650 76.992 86.36290 82.879 88.766 314.303 1000