如何使这个简单的脚本在多个内核中运行?

时间:2014-05-06 17:53:53

标签: r parallel-processing

我已经实现了一个代码,该代码读取目录中的文件并学习每个文件的模型。 我的数据和文件很庞大,需要花费很多时间才能运行。我想让它平行并在服务器上运行它,但我对如何并行化没有任何想法和经验

有人会帮我并行化吗?

以下是我想要使其平行的代码部分:

max.run<-10000
for (filename in dir(mydirectory))
{
    # Loading data

    filename = paste(data.dir,filename,sep="/")
    dfr=read.table(filename,header=TRUE)

    if (ncol(dfr) > 1)
    {
    y <- as.matrix(dfr[1])
        x <- as.matrix(dfr[2:ncol(dfr)])
        groupii <- c()
    groupiicoeffs <- c()

        while (TRUE) {


        if (ncol(x) == 0) {
        break
        }

        # Doing the Lasso regression

            M <- lars(x,y,type="lasso",normalize=TRUE,intercept=TRUE,use.Gram=TRUE)
        #exit()

        # If the Cp can not be calculated, only selecting the x most correlated with y 

        if (is.nan(M$Cp[1])) {
.
.
.
.
.


run <- run + 1
 if (run >= max.run) {

        break
        }
        }

.
.
.
.

1 个答案:

答案 0 :(得分:0)

您应该准备一些软件包以开始并行工作。

install.packages("foreach")
install.packages("parallel")
install.packages("doParallel")
library(foreach)
library(parallel)
library(doParallel)

然后按照我的代码。

max.run<-10000
n.cores = parallel::detectCores() -1 # Set number of cores you want (less than full core)
# In this code i will assume n.cores = 3
myCluster = parallel::makeCluster(n.cores) # This work start n.cores(3) new session 
# You can think myCluster as main name of our cluster.
doParallel::registerDoParallel(myCluster) # This is initiation for foreach statement

在并行过程中,重要的是设置使其平行的位置。在此程序中,我认为您必须使其与'for'语句平行,如下所示

修改后的代码

foreach::foreach(filename = dir(mydirectory),
                .packages = c("put packages you needed"),
                .combine = rbind 
# <- above '.combine' makes your result binded that produced by each foreach statement
) %dopar% {  
# Loading data

filename = paste(data.dir,filename,sep="/")
dfr=read.table(filename,header=TRUE)

if (ncol(dfr) > 1)
{
 y <- as.matrix(dfr[1])
 x <- as.matrix(dfr[2:ncol(dfr)])
 groupii <- c()
 groupiicoeffs <- c()

 while (TRUE) {


  if (ncol(x) == 0) {
    break
  }

  # Doing the Lasso regression

  M <- lars(x,y,type="lasso",normalize=TRUE,intercept=TRUE,use.Gram=TRUE)
  #exit()

  # If the Cp can not be calculated, only selecting the x most correlated with y 

  if (is.nan(M$Cp[1])) {
    .
    .
    .
    .
    .
    result # I think here is a result part point ***
    # i will comment below detail about result 

    run <- run + 1
    if (run >= max.run) {

      break
    }
  }

  .
  .
  .
  .
}
}


}

关于结果

在foreach语句中,打印结果将被绑定。为了帮助理解,我将向您展示一个简单的示例。

foreach(i=1:3,
.combine = rbind) %dopar% {
result = i
result #<- If it is not in foreach statement, result will be shown in console
}

输出为

         [,1]
result.1    1
result.2    2
result.3    3

如有疑问,请随时提问。