文库

Question

我已经看到围绕这个主题有很多问题，但似乎没有人对我的问题给出令人满意的答案。我打算在Windows计算机上将caret::train()与库doParallel结合使用。文档（The caret package: 9 Parallel Processing）告诉我，如果找到已注册的集群，它将默认并行运行（尽管它使用库doMC）。当我尝试使用doParallel设置群集并按照其文档（Getting Started with doParallel and foreach）中的示例计算时，一切正常。当我取消注册群集并运行caret::train()时，一切正常。但是当我创建一个新集群并尝试运行caret::train()时，它会产生错误Error in serialize(data, node$con) : error writing to connection。我还包括下面的日志。我不明白caret::train()如何在非并行模式下工作，但不能在并行模式下工作，尽管集群似乎没有正确设置。

文库

library(caret)
library(microbenchmark)
library(doParallel)

会话信息

sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.10      iterators_1.0.8        foreach_1.4.3          microbenchmark_1.4-2.1
[5] caret_6.0-76           ggplot2_2.2.1          lattice_0.20-35       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11       compiler_3.4.1     nloptr_1.0.4       plyr_1.8.4         tools_3.4.1       
 [6] lme4_1.1-13        tibble_1.3.3       nlme_3.1-131       gtable_0.2.0       mgcv_1.8-17       
[11] rlang_0.1.1        Matrix_1.2-10      SparseM_1.77       mvtnorm_1.0-6      stringr_1.2.0     
[16] hms_0.3            MatrixModels_0.4-1 stats4_3.4.1       grid_3.4.1         nnet_7.3-12       
[21] R6_2.2.2           survival_2.41-3    multcomp_1.4-6     TH.data_1.0-8      minqa_1.2.4       
[26] readr_1.1.1        reshape2_1.4.2     car_2.1-5          magrittr_1.5       scales_0.4.1      
[31] codetools_0.2-15   ModelMetrics_1.1.0 MASS_7.3-47        splines_3.4.1      pbkrtest_0.4-7    
[36] colorspace_1.3-2   quantreg_5.33      sandwich_2.4-0     stringi_1.1.5      lazyeval_0.2.0    
[41] munsell_0.4.3      zoo_1.8-0

从doParallel文档运行示例（无错误）

cores_2_use <- floor(0.8 * detectCores())
cl <- makeCluster(cores_2_use, outfile = "parallel_log1.txt")
registerDoParallel(cl)

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 100
temp <- microbenchmark(
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)}
  )

parallel::stopCluster(cl)
foreach::registerDoSEQ()

模拟数据

x1 = rnorm(100)           # some continuous variables 
x2 = rnorm(100)
z = 1 + 2 * x1 + 3 * x2        # linear combination with a bias
pr = 1 / (1 + exp(-z))         # pass through an inv-logit function
y = rbinom(100, 1, pr)      # bernoulli response variable
df = data.frame(y = as.factor(ifelse(y == 0, "no", "yes")), x1 = x1, x2 = x2)

运行caret :: train（）非并行（无错误）

# train control function
ctrl <- 
  trainControl(
    method = "repeatedcv", 
    number = 10,
    repeats = 5,
    classProbs = TRUE,
    summaryFunction = twoClassSummary)

# train function
microbenchmark(
  glm_nopar =
    train(y ~ .,
          data = df,
          method = "glm",
          family = "binomial",
          metric = "ROC",
          trControl = ctrl),
  times = 5)

#Unit: milliseconds
 #expr      min       lq     mean   median       uq      max neval
 #glm_nopar 691.9643 805.1762 977.1054 895.9903 1018.112 1474.284     5

运行caret :: train（）并行（错误）

cores_2_use <- floor(0.8 * detectCores())
cl <- makeCluster(cores_2_use, outfile = "parallel_log2.txt")
registerDoParallel(cl)

microbenchmark(
  glm_par =
    train(y ~ .,
          data = df,
          method = "glm",
          family = "binomial",
          metric = "ROC",
          trControl = ctrl),
  times = 5)

#Error in serialize(data, node$con) : error writing to connection

编辑（尝试没有parallel :: makeCluster（）调用）

在Linux安装程序（见下文）中也试过没有parallel :: makeCluster（）调用，即如下所示但导致相同的错误。

cores_2_use <- floor(0.8 * detectCores())
registerDoParallel(cores_2_use)
...

输出parallel_log1.txt

starting worker pid=3880 on localhost:11442 at 16:00:52.764
starting worker pid=3388 on localhost:11442 at 16:00:53.405
starting worker pid=9920 on localhost:11442 at 16:00:53.789
starting worker pid=4248 on localhost:11442 at 16:00:54.229
starting worker pid=3548 on localhost:11442 at 16:00:54.572
starting worker pid=5704 on localhost:11442 at 16:00:54.932
starting worker pid=7740 on localhost:11442 at 16:00:55.291
starting worker pid=2164 on localhost:11442 at 16:00:55.653
starting worker pid=7428 on localhost:11442 at 16:00:56.011
starting worker pid=6116 on localhost:11442 at 16:00:56.372
starting worker pid=1632 on localhost:11442 at 16:00:56.731
starting worker pid=9160 on localhost:11442 at 16:00:57.092
starting worker pid=2956 on localhost:11442 at 16:00:57.435
starting worker pid=7060 on localhost:11442 at 16:00:57.811
starting worker pid=7344 on localhost:11442 at 16:00:58.170
starting worker pid=6688 on localhost:11442 at 16:00:58.561
starting worker pid=9308 on localhost:11442 at 16:00:58.920
starting worker pid=9260 on localhost:11442 at 16:00:59.281
starting worker pid=6212 on localhost:11442 at 16:00:59.641

输出parallel_log2.txt

starting worker pid=17640 on localhost:11074 at 15:12:21.118
starting worker pid=7776 on localhost:11074 at 15:12:21.494
starting worker pid=15128 on localhost:11074 at 15:12:21.961
starting worker pid=13724 on localhost:11074 at 15:12:22.345
starting worker pid=17384 on localhost:11074 at 15:12:22.714
starting worker pid=8472 on localhost:11074 at 15:12:23.228
starting worker pid=8392 on localhost:11074 at 15:12:23.597
starting worker pid=17412 on localhost:11074 at 15:12:23.979
starting worker pid=15996 on localhost:11074 at 15:12:24.364
starting worker pid=16772 on localhost:11074 at 15:12:24.743
starting worker pid=18268 on localhost:11074 at 15:12:25.120
starting worker pid=13504 on localhost:11074 at 15:12:25.500
starting worker pid=5156 on localhost:11074 at 15:12:25.899
starting worker pid=13544 on localhost:11074 at 15:12:26.275
starting worker pid=1764 on localhost:11074 at 15:12:26.647
starting worker pid=8076 on localhost:11074 at 15:12:27.028
starting worker pid=13716 on localhost:11074 at 15:12:27.414
starting worker pid=14596 on localhost:11074 at 15:12:27.791
starting worker pid=15664 on localhost:11074 at 15:12:28.170
Loading required package: caret
Loading required package: lattice
Loading required package: ggplot2
loaded caret and set parent environment
starting worker pid=3932 on localhost:11442 at 16:01:44.384
starting worker pid=6848 on localhost:11442 at 16:01:44.731
starting worker pid=5400 on localhost:11442 at 16:01:45.098
starting worker pid=9832 on localhost:11442 at 16:01:45.475
starting worker pid=8448 on localhost:11442 at 16:01:45.928
starting worker pid=1284 on localhost:11442 at 16:01:46.289
starting worker pid=9892 on localhost:11442 at 16:01:46.632
starting worker pid=8312 on localhost:11442 at 16:01:46.991
starting worker pid=3696 on localhost:11442 at 16:01:47.349
starting worker pid=9108 on localhost:11442 at 16:01:47.708
starting worker pid=8548 on localhost:11442 at 16:01:48.083
starting worker pid=7288 on localhost:11442 at 16:01:48.442
starting worker pid=6872 on localhost:11442 at 16:01:48.801
starting worker pid=3760 on localhost:11442 at 16:01:49.145
starting worker pid=3468 on localhost:11442 at 16:01:49.503
starting worker pid=2500 on localhost:11442 at 16:01:49.862
starting worker pid=7200 on localhost:11442 at 16:01:50.205
starting worker pid=7820 on localhost:11442 at 16:01:50.564
starting worker pid=8852 on localhost:11442 at 16:01:50.923
Error in unserialize(node$con) : 
  ReadItem: unknown type 0, perhaps written by later version of R
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

编辑（尝试使用Ubuntu）

文库

library(caret)
library(microbenchmark)
library(doMC)

sessionInfo（）

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doMC_1.3.4             iterators_1.0.8        foreach_1.4.3         
[4] microbenchmark_1.4-2.1 caret_6.0-77           ggplot2_2.2.1         
[7] lattice_0.20-35       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11       ddalpha_1.2.1      compiler_3.4.1     DEoptimR_1.0-8    
 [5] gower_0.1.2        plyr_1.8.4         bindr_0.1          class_7.3-14      
 [9] tools_3.4.1        rpart_4.1-11       ipred_0.9-6        lubridate_1.6.0   
[13] tibble_1.3.3       nlme_3.1-131       gtable_0.2.0       pkgconfig_2.0.1   
[17] rlang_0.1.1        Matrix_1.2-11      RcppRoll_0.2.2     prodlim_1.6.1     
[21] bindrcpp_0.2       withr_2.0.0        stringr_1.2.0      dplyr_0.7.1       
[25] recipes_0.1.0      stats4_3.4.1       nnet_7.3-12        CVST_0.2-1        
[29] grid_3.4.1         robustbase_0.92-7  glue_1.1.1         R6_2.2.2          
[33] survival_2.41-3    lava_1.5           purrr_0.2.2.2      reshape2_1.4.2    
[37] kernlab_0.9-25     magrittr_1.5       DRR_0.0.2          splines_3.4.1     
[41] scales_0.4.1       codetools_0.2-15   ModelMetrics_1.1.0 MASS_7.3-47       
[45] assertthat_0.2.0   dimRed_0.1.0       timeDate_3012.100  colorspace_1.3-2  
[49] stringi_1.1.5      lazyeval_0.2.0     munsell_0.4.3

来自Getting Started with doMC and foreach 的

示例
按预期工作。

示例插入符号非并行

microbenchmark(
  glm_nopar =
    train(y ~ .,
          data = df,
          method = "glm",
          family = "binomial",
          metric = "ROC",
          trControl = ctrl),
  times = 5)

#Unit: seconds
#     expr      min       lq     mean   median       uq      max neval
#glm_nopar 1.093237 1.098342 1.481444 1.102867 2.001443 2.111333     5

插入与Windows设置并行（提供错误）

cores_2_use <- floor(0.8 * parallel::detectCores())
cl <- parallel::makeCluster(cores_2_use, outfile = "parallel_log2_linux.txt")
registerDoMC(cl)

microbenchmark(
  glm_par =
    train(y ~ .,
          data = df,
          method = "glm",
          family = "binomial",
          metric = "ROC",
          trControl = ctrl),
  times = 5)

# Error in getOper(ctrl$allowParallel && getDoParWorkers() > 1) :(list) object cannot be coerced to type 'double'

parallel_log2_linux.txt

starting worker pid=6343 on localhost:11836 at 16:05:17.781
starting worker pid=6353 on localhost:11836 at 16:05:18.025
starting worker pid=6362 on localhost:11836 at 16:05:18.266

没有parallel::makeCluster()调用（没有错误）的

插入符号并行

不清楚如何在此设置中定义日志输出。

cores_2_use <- floor(0.8 * parallel::detectCores())
registerDoMC(cores_2_use)

microbenchmark(
  glm_par =
    train(y ~ .,
          data = df,
          method = "glm",
          family = "binomial",
          metric = "ROC",
          trControl = ctrl),
  times = 5)

#Unit: milliseconds
#    expr      min       lq     mean   median       uq      max neval
# glm_par 991.8075 997.4397 1013.686 998.8241 1004.381 1075.978     5

Answer 1

我尝试使用内核较少但代码设置相同的Windows 10计算机。但是，我使用了来自Github的caret的开发版本（通过devtools::install_github('topepo/caret/pkg/caret')安装）以及R 3.4.1，并且该问题无法再现。并行集群运行没有问题与下面的代码。不幸的是，我无法访问原始的Windows 7工作站，以查看问题是否仍存在caret开发版和/或更新的R版本。

library(doParallel)
cores_2_use <- floor(0.8 * detectCores())
cl <- makeCluster(cores_2_use, outfile = "parallel_log.txt")
registerDoParallel(cl)

glm_par <-
  microbenchmark(glm_par =
    train(default ~ .,
            data = benchmark_train_data,
            method = "glm",
            family = "binomial",
            metric = "ROC",
            trControl = ctrl),
    times = 5
    )

glm_par

#Unit: seconds
#    expr      min       lq     mean   median       uq      max neval
# glm_par 13.14082 13.25298 16.77678 13.64924 13.78132 30.05955     5

编辑（非平行基准）

这是在一个内核上运行的相同代码（与上面的六个内核并行相反） - 预计并行设置会有更好的性能。

#Unit: seconds
#      expr      min       lq     mean   median       uq      max neval
# glm_nopar 25.44122 25.52031 25.64818 25.53692 25.56496 26.17751     5

Answer 2

您必须使用与您的群集类型相对应的foreach后端。如果您要使用parallel::makeCluster创建群集，请将其注册为doParallel::registerDoParallel。

cl <- parallel::makeCluster(cores_2_use, outfile = "parallel_log2_linux.txt")
library(doParallel)
registerDoParallel(cl)

Answer 3

看起来因为你在Windows上，你已经搞砸了

doMC包充当foreach和并行包的多核功能之间的接口，最初由Simon Urbanek编写，并且并行地并入R2.14.0。多核功能目前仅适用于支持fork系统调用的操作系统（这意味着不支持Windows）

Caret使用doMC。见caret/parallel-processing.html

library(doMC)
registerDoMC(cores = 5)
model <- train(y ~ ., data = training, method = "rf")

注意OP已经编辑了他的原始帖子。 OP开始在Windows上运行。

编辑 - 单个评论太长

doParallel无法拯救caret并行化。（但我可能错了......请通过更多的downvotes和评论让我知道）

1）请在Windows上自行尝试...当我尝试使用doParalell时，默认为顺序。（我想知道它是否适用于其他人的Windows机器）。

这是有道理的，它默认为顺序因为

2） caret使用doMC。见here，

caret利用R中的一个并行处理框架来做到这一点。 foreach软件包允许使用多种不同的技术（例如多核或Rmpi软件包）顺序或并行运行R代码（有关可用选项的摘要和说明，请参阅Schmidberger等，2009）。有几个R包与foreach一起使用来实现这些技术，例如doMC（用于多核）或doMPI（用于Rmpi）。

3） doParallel只需合并doMC和doSNOW。见here。

doParallel包是doSNOW和doMC的合并，就像并行是snow和multicore的合并一样。

请注意，链接中已接受答案的作者是 Steve Weston ，doParallel包的作者之一。

4） doMC分叉Windows不支持的进程（Windows仅支持SNOW和SOCK进程）再次参见here Steve Weston

多核功能目前仅适用于支持该功能的操作系统 fork系统调用（这意味着不支持Windows）

通过doParallel

文库

会话信息

从doParallel文档运行示例（无错误）

模拟数据

运行caret :: train（）非并行（无错误）

运行caret :: train（）并行（错误）

编辑（尝试没有parallel :: makeCluster（）调用）

输出parallel_log1.txt

输出parallel_log2.txt

编辑（尝试使用Ubuntu）

文库

sessionInfo（）

示例
按预期工作。

示例插入符号非并行

插入与Windows设置并行（提供错误）

parallel_log2_linux.txt

3 个答案:

通过doParallel

文库

会话信息

从doParallel文档运行示例（无错误）

模拟数据

运行caret :: train（）非并行（无错误）

运行caret :: train（）并行（错误）

编辑（尝试没有parallel :: makeCluster（）调用）

输出parallel_log1.txt

输出parallel_log2.txt

编辑（尝试使用Ubuntu）

文库

sessionInfo（）

示例 按预期工作。

示例插入符号非并行

插入与Windows设置并行（提供错误）

parallel_log2_linux.txt

3 个答案:

示例
按预期工作。