我是R的新手,无法对嵌套循环进行矢量化,这种循环速度特别慢。循环遍历一个中心列表(存储在结构中的向量),并找到这些向量与下面名为x
的数组行之间的距离。我知道这需要对速度进行矢量化,但无法确定apply
的相应函数或使用它。
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
for(c in 1:dim(clusterCenters)[1]){
center <- clusterCenters[c,]
for(v in 1:(dim(clusterMembers)[1])){
vector <- clusterMembers[v,]
features[v,c] <- sqrt(sum((center - vector)^2))
}
}
感谢您的帮助。
答案 0 :(得分:2)
您可以利用R的回收规则来提高速度。 但是你必须知道并解释R以列主要顺序存储矩阵的事实。您可以通过转置clusterMembers
来实现这一目标,然后center
向量将在t(clusterMembers)
的列中循环使用。
set.seed(21)
clusterCenters <- matrix(runif(10000),nrow=100)
clusterMembers <- matrix(runif(400000),nrow=4000)
# your original code in function form
seven <- function() {
features <- matrix(0,(dim(clusterMembers)[1]),(dim(clusterCenters)[1]))
for(c in 1:dim(clusterCenters)[1]){
center <- clusterCenters[c,]
for(v in 1:(dim(clusterMembers)[1])){
vector <- clusterMembers[v,]
features[v,c] <- sqrt(sum((center - vector)^2))
}
}
features
}
# my fancy function
josh <- function() {
tcm <- t(clusterMembers)
Features <- matrix(0,ncol(tcm),nrow(clusterCenters))
for(i in 1:nrow(clusterCenters)) {
# clusterCenters[i,] returns a vector because drop=TRUE by default
Features[,i] <- colSums((clusterCenters[i,]-tcm)^2)
}
Features <- sqrt(Features) # outside the loop to avoid function calls
}
system.time(seven())
# user system elapsed
# 2.7 0.0 2.7
system.time(josh())
# user system elapsed
# 0.28 0.11 0.39
identical(seven(),josh())
# [1] TRUE