在LINUX中使用多核运行R脚本

时间:2017-09-15 16:44:18

标签: r performance parallel-processing

我有一个名为" my_function"我想使用多核在LINUX中使用mcapply并行运行。我有10个cty_id的20年来为每个cty_id运行。如何使用mcapply使用4个内核快速运行?我已经测试了我的功能,一次运行一个县和一年,它运行正常。但是,我想加快这个过程,而不是一次手动改变年份和cty_id。

cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)

my_function <- function(cty_id,val_yr) {

<do something here> ()

}

我的代码如下,但它不能完成工作并崩溃。

library("parallel")
mcapply(c(205,15,37,59,25,133,11,23,21,19),FUN=my_function, val_yr=years[1998:2017], 4L)

有人可以帮助我更快地运行吗?

请告诉我需要定义的全局变量。

修改了下面的R.代码文件
in_file11 <- 'PLSS_KS_All_1999_2017.txt' 
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt' 
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'

in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)

in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))

in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv' 
in_file3 <- 'All_counties_1999_2017.csv'

in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)

years <- c(1998:2017)
st_id <- c(15)  
crop_id <- c(11)

my_function <- function(cty_id,val_yr) {

<do something here> ()

}


registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)

plan(cluster, workers = cluztrr)


county_id <- c(19,205)
val_year <- c(1998:1999)

foo <- expand.grid(county_id,val_year)


foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
  my_function(foo[i,]$Var1,foo[i,]$Var2)
}

stopCluster(cluztrr)

Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
  already exporting variable(s): st_id, crop_id

1 个答案:

答案 0 :(得分:2)

现在的未来: - )

使用future包在R中进行并行计算。 doFuture是循环(foreach循环)的子包。

library(doFuture)
registerDoFuture()
plan(multiprocess)

cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(X,Y) {
    cat(X, Y, "\n")
}

result <- foreach(i = cty_id) %dopar% {
    foreach(j = val_yr) %do% {
        my_function(i, j)
    }
}

修改

这就是我为自己编写这样的代码的方法(将多个循环缩小为一个)

A <- c(205, 15, 37, 59, 25, 133, 11, 23, 21, 19)
B <- c(1998:2017)
foo <- expand.grid(A, B)
myFunction <- function(X, Y) {
    cat(X, Y, "\n")
}
foreach(i = 1:nrow(foo)) %dopar% {
    my_function(foo[i, ]$Var1, foo[i, ]$Var2)
}

关于it's GitHub page的未来的更多信息,以及对youtube的真正好的介绍。