我正在尝试使用parLapply在我的机器的4个核心上并行化一些功能。 我的函数定义了两个嵌入式循环,用于填充预定义矩阵M的一些空列。 但是,当我运行下面的代码时,我得到以下错误
2 nodes produced errors; first error: incorrect number of dimensions
代码:
require("parallel")
TheData<-list(E,T) # list of 2 matrices of different dimensions, T is longer and wider than E
myfunction <- function(TheData) {
for (k in 1:length(TheData[[1]][,1])) {
distance<-matrix(,nrow=length(TheData[[1]][,1]),ncol=1)
for (j in 1:length(TheData[[2]][,1])) {
distance[j]<-sqrt((as.numeric(TheData[[2]][j,1])-as.numeric(TheData[[1]][k,2]))^2+(as.numeric(TheData[[2]][j,2])-as.numeric(TheData[[1]][k,1]))^2)
}
index<-which(distance == min(distance))
M[k,4:9]<-c(as.numeric(TheData[[2]][index,1]),as.numeric(TheData[[2]][index,2]),as.numeric(TheData[[2]][index,3]),as.numeric(TheData[[2]][index,4]),as.numeric(TheData[[2]][index,5]),as.numeric(TheData[[2]][index,6]))
rm(distance)
gc()
}
}
n_cores <- 4
Cl = makeCluster(n_cores)
Results <- parLapplyLB(Cl, TheData, myfunction)
# I also tried: Results <- parLapply(Cl, TheData, myfunction)
答案 0 :(得分:1)
在你的例子中,parLapply迭代一个矩阵列表,并将这些矩阵作为参数传递给&#34; myfunction&#34;。然而,&#34; myfunction&#34;似乎期望它的参数是两个矩阵的列表,因此发生错误。我可以用以下内容重现该错误:
> E <- matrix(0, 4, 4)
> E[[1]][,1]
Error in E[[1]][, 1] : incorrect number of dimensions
我不确定你真正想要做什么,但是对于&#34; myfunction&#34;的当前实现,我希望你能用一个包含两个列表的列表来调用parLapply矩阵,例如:
TheDataList <- list(list(A,B), list(C,D), list(E,F), list(G,H))
将此作为parLapply的第二个参数传递将导致&#34; myfunction&#34;被调用四次,每次都有一个包含两个矩阵的列表。
但你的例子有另一个问题。看起来你期望parLapply修改矩阵&#34; M&#34;作为副作用,但它不能。我认为你应该改变&#34; myfunction&#34;返回一个矩阵。 parLapply将返回列表中的矩阵,然后您可以将它们绑定到所需的结果中。
<强>更新强>
从您的评论中,我现在相信您基本上想要并行化#34; myfunction&#34;。我试图这样做:
library(parallel)
cl <- makeCluster(4)
myfunction <- function(Exy) {
iM <- integer(nrow(Exy))
for (k in 1:nrow(Exy)) {
distance <- sqrt((Txy[,1] - Exy[k,2])^2 + (Txy[,2] - Exy[k,1])^2)
iM[k] <- which.min(distance)
}
iM
}
# Random example data for testing
T <- matrix(rnorm(150), 10)
E <- matrix(rnorm(120), 10)
# Only export the first two columns to T to the workers
Txy <- T[,1:2]
clusterExport(cl, c('Txy'))
# Parallelize "myfunction" by calling it in parallel on block rows of "E".
ExyList <- parallel:::splitRows(E[,1:2], length(cl))
iM <- do.call('c', clusterApply(cl, ExyList, myfunction))
# Update "M" using data from "T" indexed by "iM"
M <- matrix(0, nrow(T), 9) # more fake data
for (k in iM) {
M[k,4:9] <- T[k, 1:6]
}
print(M)
stopCluster(cl)
注意: