R中缓慢的嵌套循环

时间:2013-03-04 17:29:30

标签: performance r loops nested vectorization

我是R的新手,无法对嵌套循环进行矢量化,这种循环速度特别慢。循环遍历一个中心列表(存储在结构中的向量),并找到这些向量与下面名为x的数组行之间的距离。我知道这需要对速度进行矢量化,但无法确定apply的相应函数或使用它。

clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)

features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))

for(c in 1:dim(clusterCenters)[1]){
  center <- clusterCenters[c,]
  for(v in 1:(dim(clusterMembers)[1])){
    vector <- clusterMembers[v,]
    features[v,c] <- sqrt(sum((center - vector)^2))
  }
}

感谢您的帮助。

1 个答案:

答案 0 :(得分:2)

您可以利用R的回收规则来提高速度。 但是你必须知道并解释R以列主要顺序存储矩阵的事实。您可以通过转置clusterMembers来实现这一目标,然后center向量将在t(clusterMembers)的列中循环使用。

set.seed(21)
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
# your original code in function form
seven <- function() {
  features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
  for(c in 1:dim(clusterCenters)[1]){
    center <- clusterCenters[c,]
    for(v in 1:(dim(clusterMembers)[1])){
      vector <- clusterMembers[v,]
      features[v,c] <- sqrt(sum((center - vector)^2))
    }
  }
  features
}
# my fancy function
josh <- function() {
  tcm <- t(clusterMembers)
  Features <- matrix(0,ncol(tcm),nrow(clusterCenters))
  for(i in 1:nrow(clusterCenters)) {
    # clusterCenters[i,] returns a vector because drop=TRUE by default
    Features[,i] <- colSums((clusterCenters[i,]-tcm)^2)
  }
  Features <- sqrt(Features)  # outside the loop to avoid function calls
}
system.time(seven())
#    user  system elapsed 
#     2.7     0.0     2.7 
system.time(josh())
#    user  system elapsed 
#    0.28    0.11    0.39 
identical(seven(),josh())
# [1] TRUE