我有一个名为" my_function"我想使用多核在LINUX中使用mcapply并行运行。我有10个cty_id的20年来为每个cty_id运行。如何使用mcapply使用4个内核快速运行?我已经测试了我的功能,一次运行一个县和一年,它运行正常。但是,我想加快这个过程,而不是一次手动改变年份和cty_id。
cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(cty_id,val_yr) {
<do something here> ()
}
我的代码如下,但它不能完成工作并崩溃。
library("parallel")
mcapply(c(205,15,37,59,25,133,11,23,21,19),FUN=my_function, val_yr=years[1998:2017], 4L)
有人可以帮助我更快地运行吗?
请告诉我需要定义的全局变量。
修改了下面的R.代码文件in_file11 <- 'PLSS_KS_All_1999_2017.txt'
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt'
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'
in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)
in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))
in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv'
in_file3 <- 'All_counties_1999_2017.csv'
in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)
years <- c(1998:2017)
st_id <- c(15)
crop_id <- c(11)
my_function <- function(cty_id,val_yr) {
<do something here> ()
}
registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)
plan(cluster, workers = cluztrr)
county_id <- c(19,205)
val_year <- c(1998:1999)
foo <- expand.grid(county_id,val_year)
foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
my_function(foo[i,]$Var1,foo[i,]$Var2)
}
stopCluster(cluztrr)
Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): st_id, crop_id
答案 0 :(得分:2)
现在的未来: - )
使用future
包在R
中进行并行计算。 doFuture
是循环(foreach
循环)的子包。
library(doFuture)
registerDoFuture()
plan(multiprocess)
cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(X,Y) {
cat(X, Y, "\n")
}
result <- foreach(i = cty_id) %dopar% {
foreach(j = val_yr) %do% {
my_function(i, j)
}
}
修改强>
这就是我为自己编写这样的代码的方法(将多个循环缩小为一个)
A <- c(205, 15, 37, 59, 25, 133, 11, 23, 21, 19)
B <- c(1998:2017)
foo <- expand.grid(A, B)
myFunction <- function(X, Y) {
cat(X, Y, "\n")
}
foreach(i = 1:nrow(foo)) %dopar% {
my_function(foo[i, ]$Var1, foo[i, ]$Var2)
}
关于it's GitHub page的未来的更多信息,以及对youtube的真正好的介绍。