如何使apply()函数更快?

时间:2016-12-18 19:41:05

标签: r apply

我有两个矩阵。我想使用第一列的列来过滤第二列,然后找到过滤集的总和。我使用了以下代码,它完全正常。

apply(firstMat,2,function(x) sum(secondMat[x,x]))

但是,数据集很大,我想找到一种替代方法,使流程更快。

以下是小规模的可重复示例:

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)

如果你能帮助我,我将非常感激。

2 个答案:

答案 0 :(得分:1)

您可以在多个群集上并行运行apply功能

firstMat<-matrix(c(T,F,T,F,F,T,T,F,F,F),nrow=5,ncol=2)
secondMat<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,0,0,1,1,1,0,1),nrow=5,ncol=5)

# create custers
library(doSNOW)
cl <- makeCluster(2, type = "SOCK") # creates 2 clusters 
# can use detectCores() from package parallel to check number of cores in your machine
registerDoSNOW(cl)
clusterExport(cl,list("secondMat")) # need to export secndMAT to each cluster since will be used in cluster

# Option 1: Using parApply from package `parallel`
library(parallel)
parApply(cl,firstMat,2,function(x) sum(secondMat[x,x]))

# Option 2: Using aaply from package `plyr`
library(plyr)    
aaply(firstMat,2,function(x) sum(secondMat[x,x]),.parallel=T)

stopCluster(cl)

使用可重复性较小的示例,它没有显示任何速度改进,但我认为对于大型矩阵,两个选项都比apply更快

答案 1 :(得分:1)

也许你的BLAS比显式循环更快:

diag( t(firstMat) %*% secondMat %*% firstMat )