鉴于我正在使用的数据大小,我想并行处理
我已经设置了如下代码,保持一个核心空闲,因此没有使用整个机器
library(DoMC)
library(foreach)
library(itertools)
num_cores <- round(detectCores()*1-1) # num_cores is 7 in this case
registerDoMC(num_cores)
test_prediction <-
data.frame(
foreach(d=isplitRows(test, chunks = num_cores),
.combine=c,
.packages=c("stats")) %dopar% {
predict(cfModel, newdata=d)
}
)
问题是返回的test_prediction比测试的行少,我无法弄清楚为什么
几次尝试返回的行告诉我,.combine
中的foreach
没有从某些核心收集数据,虽然我不确定如何确认这个理论< / p>
总行数603,054
Attempt 1: rows returned > 516,903 - 6/7s of data returned
Attempt 2: rows returned > 344,602 - 4/7s of data returned
Attempt 3: rows returned > 430,753 - 5/7s of data returned
这只有在并行运行时才会发生,如果我使用%do%
选项而不是返回正确的行数 - 虽然我不确定如何进一步研究这个理论?
一般情况下,如果有更好的方法并行运行,有什么帮助吗?将不胜感激
会议信息:
> sessionInfo()
R version 3.3.0 beta (2016-03-30 r70404)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] miniCRAN_0.2.7 markdown_0.7.7 slackr_1.4.2 readr_0.2.2 readxl_0.1.1 testthat_1.0.2 R2HTML_2.3.2 itertools_0.1-3 XML_3.98-1.4
[10] rvest_0.3.2 xml2_1.0.0 devtools_1.12.0 doParallel_1.0.10 rjson_0.2.15 RCurl_1.95-4.8 bitops_1.0-6 bit64_0.9-5 bit_1.1-12
[19] qcc_2.6 optiRum_0.37.3 scales_0.4.0 doMC_1.3.4 iterators_1.0.8 foreach_1.4.3 pryr_0.1.2 party_1.0-25 strucchange_1.5-1
[28] sandwich_2.3-4 zoo_1.7-13 modeltools_0.2-21 mvtnorm_1.0-5 e1071_1.6-7 randomForest_4.6-12 caret_6.0-70 lattice_0.20-29 timeDate_3012.100
[37] Kmisc_0.5.0 reshape2_1.4.1 gridExtra_2.2.1 tidyr_0.5.1 dplyr_0.5.0 plyr_1.8.4 data.table_1.9.6 sendmailR_1.2-1 RPostgreSQL_0.4-1
[46] ggplot2_2.1.0 lubridate_1.5.6 stringr_1.0.0 sqldf_0.4-10 RSQLite_1.0.0 DBI_0.4-1 gsubfn_0.6-6 proto_0.3-10
loaded via a namespace (and not attached):
[1] nlme_3.1-128 pbkrtest_0.4-6 httr_1.2.1 tools_3.3.0 R6_2.1.2 lazyeval_0.2.0 mgcv_1.8-3 colorspace_1.2-6 nnet_7.3-8 withr_1.0.2
[11] compiler_3.3.0 chron_2.3-47 quantreg_5.26 SparseM_1.7 AUC_0.3.0 digest_0.6.9 minqa_1.2.4 base64enc_0.1-3 lme4_1.1-12 jsonlite_1.0
[21] car_2.1-2 magrittr_1.5 Matrix_1.2-6 Rcpp_0.12.6 munsell_0.4.3 stringi_1.1.1 multcomp_1.4-6 MASS_7.3-35 crayon_1.3.2 splines_3.3.0
[31] knitr_1.13 tcltk_3.3.0 codetools_0.2-9 nloptr_1.0.4 MatrixModels_0.4-1 gtable_0.2.0 assertthat_0.1 coin_1.1-2 class_7.3-11 survival_2.39-5
[41] tibble_1.1 memoise_1.0.0 TH.data_1.0-7
答案 0 :(得分:0)
这对我来说很好用:
library(doMC)
library(foreach)
library(itertools)
num_cores <- 2
registerDoMC(num_cores)
predict.fake <- function(object, newdata) {
rowSums(newdata)
}
cfModel <- structure(NULL, class = "fake")
test <- matrix(rnorm(2000), ncol = 2)
pred <- predict(cfModel, test)
test_prediction <-
foreach(d=isplitRows(test, chunks = num_cores),
.combine=c,
.packages=c("stats")) %dopar% {
predict(cfModel, newdata=d)
}
all.equal(test_prediction, pred)