我使用mlr进行文本分类任务。我已经编写了一个自定义过滤器,如此处所述
过滤器按预期工作,但是当我尝试使用parallelization时,我收到以下错误:
Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 4; elements = 2.
Error in stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character)) :
Errors occurred in 2 slave jobs, displaying at most 10 of them:
00001: Error in parallel:::.slaveRSOCK() :
Assertion on 'method' failed: Must be element of set {'anova.test','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}.
我从错误中假设我的自定义过滤器需要是集合中的一个元素,才有可能并行工作,但是如果(a)这是可能的话,我们无法设法解决,(b)如果是,我该怎么办呢。
提前感谢您的帮助, 阿扎姆
已添加:测试脚本 由于灵敏度,我无法让您看到我正在使用的实际脚本/数据,但此示例再现了我看到的错误。除了自定义功能选择和数据集之外,设置学习者和评估它的步骤与我在“真实”中的步骤一样。脚本。在我的实际情况中,如果删除parallelStartSocket()命令,则脚本将按预期运行。
我还应该补充说,在使用RBF内核调整SVM的超参数时,我已成功使用(或至少我没有收到错误)并行处理:该脚本与makeParamSet()定义完全相同。
library(parallelMap)
library(mlr)
library(kernlab)
makeFilter(
name = "nonsense.filter",
desc = "Calculates scores according to alphabetical order of features",
pkg = "mlr",
supported.tasks = c("classif", "regr", "surv"),
supported.features = c("numerics", "factors", "ordered"),
fun = function(task, nselect, decreasing = TRUE, ...) {
feats = getTaskFeatureNames(task)
imp = order(feats, decreasing = decreasing)
names(imp) = feats
imp
}
)
# set up svm with rbf kernal
svm.lrn <- makeLearner("classif.ksvm",predict.type = "response")
# wrap learner with filter
svm.lrn <- makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")
# define feature selection parameters
ps.svm = makeParamSet(
makeDiscreteParam("fw.abs", values = seq(2, 3, 1))
)
# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)
svm.lrn <- makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm,
control = ctrl.svm)
# set up outer resampling
outer.svm <- makeResampleDesc("CV", iters = 10, stratify = TRUE)
# run it...
parallelStartSocket(2)
run.svm <- resample(svm.lrn, iris.task,
resampling = outer.svm, extract = getTuneResult)
parallelStop()
答案 0 :(得分:1)
问题是makeFilter
注册了S3方法,这些方法在单独的R进程中不可用。您有两个选项可以完成这项工作:或者只使用parallelStartMulticore(2)
以便所有内容都在同一个R进程中运行,或者告诉parallelMap
有关其他R进程中需要存在的部分。
后者有两个部分。首先,使用parallelLibrary("mlr")
在任何地方加载mlr并将过滤器的定义拉出到可以使用parallelSource()
加载的单独文件中。例如:
filter.R:
makeFilter(
name = "nonsense.filter",
desc = "Calculates scores according to alphabetical order of features",
pkg = "mlr",
supported.tasks = c("classif", "regr", "surv"),
supported.features = c("numerics", "factors", "ordered"),
fun = function(task, nselect, decreasing = TRUE, ...) {
feats = getTaskFeatureNames(task)
imp = order(feats, decreasing = decreasing)
names(imp) = feats
imp
}
)
main.R:
library(parallelMap)
library(mlr)
library(kernlab)
parallelStartSocket(2)
parallelLibrary("mlr")
parallelSource("filter.R")
# set up svm with rbf kernal
svm.lrn = makeLearner("classif.ksvm",predict.type = "response")
# wrap learner with filter
svm.lrn = makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")
# define feature selection parameters
ps.svm = makeParamSet(
makeDiscreteParam("fw.abs", values = seq(2, 3, 1))
)
# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)
svm.lrn = makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm,
control = ctrl.svm)
# set up outer resampling
outer.svm = makeResampleDesc("CV", iters = 10, stratify = TRUE)
# run it...
run.svm = resample(svm.lrn, iris.task, resampling = outer.svm, extract = getTuneResult)
parallelStop()