我已经实现了一个代码,该代码读取目录中的文件并学习每个文件的模型。 我的数据和文件很庞大,需要花费很多时间才能运行。我想让它平行并在服务器上运行它,但我对如何并行化没有任何想法和经验
有人会帮我并行化吗?以下是我想要使其平行的代码部分:
max.run<-10000
for (filename in dir(mydirectory))
{
# Loading data
filename = paste(data.dir,filename,sep="/")
dfr=read.table(filename,header=TRUE)
if (ncol(dfr) > 1)
{
y <- as.matrix(dfr[1])
x <- as.matrix(dfr[2:ncol(dfr)])
groupii <- c()
groupiicoeffs <- c()
while (TRUE) {
if (ncol(x) == 0) {
break
}
# Doing the Lasso regression
M <- lars(x,y,type="lasso",normalize=TRUE,intercept=TRUE,use.Gram=TRUE)
#exit()
# If the Cp can not be calculated, only selecting the x most correlated with y
if (is.nan(M$Cp[1])) {
.
.
.
.
.
run <- run + 1
if (run >= max.run) {
break
}
}
.
.
.
.
答案 0 :(得分:0)
您应该准备一些软件包以开始并行工作。
install.packages("foreach")
install.packages("parallel")
install.packages("doParallel")
library(foreach)
library(parallel)
library(doParallel)
然后按照我的代码。
max.run<-10000
n.cores = parallel::detectCores() -1 # Set number of cores you want (less than full core)
# In this code i will assume n.cores = 3
myCluster = parallel::makeCluster(n.cores) # This work start n.cores(3) new session
# You can think myCluster as main name of our cluster.
doParallel::registerDoParallel(myCluster) # This is initiation for foreach statement
在并行过程中,重要的是设置使其平行的位置。在此程序中,我认为您必须使其与'for'语句平行,如下所示
foreach::foreach(filename = dir(mydirectory),
.packages = c("put packages you needed"),
.combine = rbind
# <- above '.combine' makes your result binded that produced by each foreach statement
) %dopar% {
# Loading data
filename = paste(data.dir,filename,sep="/")
dfr=read.table(filename,header=TRUE)
if (ncol(dfr) > 1)
{
y <- as.matrix(dfr[1])
x <- as.matrix(dfr[2:ncol(dfr)])
groupii <- c()
groupiicoeffs <- c()
while (TRUE) {
if (ncol(x) == 0) {
break
}
# Doing the Lasso regression
M <- lars(x,y,type="lasso",normalize=TRUE,intercept=TRUE,use.Gram=TRUE)
#exit()
# If the Cp can not be calculated, only selecting the x most correlated with y
if (is.nan(M$Cp[1])) {
.
.
.
.
.
result # I think here is a result part point ***
# i will comment below detail about result
run <- run + 1
if (run >= max.run) {
break
}
}
.
.
.
.
}
}
}
在foreach语句中,打印结果将被绑定。为了帮助理解,我将向您展示一个简单的示例。
foreach(i=1:3,
.combine = rbind) %dopar% {
result = i
result #<- If it is not in foreach statement, result will be shown in console
}
输出为
[,1]
result.1 1
result.2 2
result.3 3
如有疑问,请随时提问。