此代码按预期工作:
library(dplyr)
data <- list(t1 = "hello world.", t2 = "bye world")
library(doMC)
registerDoMC(3)
res <- foreach(t = data) %dopar% {
print(sprintf("processing %s", t))
data.frame(text = t) %>%
dplyr::count(text)
}
print(res)
然而,这段代码只是打印&#34;处理你好世界。&#34;和#34;处理再见世界&#34;然后挂起(没有抛出异常)。
library(dplyr)
coreNLP::initCoreNLP()
data <- list(t1 = "hello world.", t2 = "bye world")
library(doMC)
registerDoMC(3)
res <- foreach(t = data) %dopar% {
print(sprintf("processing %s", t))
coreNLP::annotateString(t)$token
}
print(res)
如果我将%dopar%
更改为%do%
,则上述代码将按预期工作。
我不明白是什么导致了这种行为。为什么在%dopar%
内调用coreNLP函数导致R挂起但与其他包一起工作正常?这是否与coreNLP对Java的依赖有关?
这是sessionInfo()
的输出:
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.0
答案 0 :(得分:1)
你的第一个例子对于我看起来像是类似的设置就好了。运行示例后的会话信息如下;请务必使用新的R会话(R --vanilla
)重试。我有四个核心(来自parallel::detectCores()
)。
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] doMC_1.3.4 iterators_1.0.8 foreach_1.4.3 dplyr_0.5.0
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 R6_2.2.0 assertthat_0.2.0
[5] DBI_0.6-1 tibble_1.3.0 Rcpp_0.12.10 codetools_0.2-15
你的第二个例子不也适用于我。输出如下。我的猜测是分叉进程可以不共享coreNLP所依赖的相同底层Java进程/服务;我真的不知道coreNLP。
> res <- foreach(t = data) %dopar% {
+
+ print(sprintf("processing %s", t))
+
+ coreNLP::annotateString(t)$token
+
+ }
[1] "processing hello world."
[1] "processing bye world"
^CError in selectChildren(ac, 1) :
Java called System.exit(130) requesting R to quit - trying to recover
Error during wrapup: C stack usage 591577121812 is too close to the limit
*** caught segfault ***
address 0x2, cause 'memory not mapped'