在mlr

时间:2017-02-10 17:54:29

标签: r mlr

我使用mlr进行文本分类任务。我已经编写了一个自定义过滤器,如此处所述

Create Custom Filters

过滤器按预期工作,但是当我尝试使用parallelization时,我收到以下错误:

Exporting objects to slaves for mode socket: .mlr.slave.options
Mapping in parallel: mode = socket; cpus = 4; elements = 2.
Error in stopWithJobErrorMessages(inds, vcapply(result.list[inds], as.character)) : 
  Errors occurred in 2 slave jobs, displaying at most 10 of them:

00001: Error in parallel:::.slaveRSOCK() : 
  Assertion on 'method' failed: Must be element of set {'anova.test','carscore','cforest.importance','chi.squared','gain.ratio','information.gain','kruskal.test','linear.correlation','mrmr','oneR','permutation.importance','randomForest.importance','randomForestSRC.rfsrc','randomForestSRC.var.select','rank.correlation','relief','rf.importance','rf.min.depth','symmetrical.uncertainty','univariate','univariate.model.score','variance'}.

我从错误中假设我的自定义过滤器需要是集合中的一个元素,才有可能并行工作,但是如果(a)这是可能的话,我们无法设法解决,(b)如果是,我该怎么办呢。

提前感谢您的帮助, 阿扎姆

已添加:测试脚本 由于灵敏度,我无法让您看到我正在使用的实际脚本/数据,但此示例再现了我看到的错误。除了自定义功能选择和数据集之外,设置学习者和评估它的步骤与我在“真实”中的步骤一样。脚本。在我的实际情况中,如果删除parallelStartSocket()命令,则脚本将按预期运行。

我还应该补充说,在使用RBF内核调整SVM的超参数时,我已成功使用(或至少我没有收到错误)并行处理:该脚本与makeParamSet()定义完全相同。

library(parallelMap)
library(mlr)
library(kernlab)

makeFilter(
  name = "nonsense.filter",
  desc = "Calculates scores according to alphabetical order of features",
  pkg = "mlr",
  supported.tasks = c("classif", "regr", "surv"),
  supported.features = c("numerics", "factors", "ordered"),
  fun = function(task, nselect, decreasing = TRUE, ...) {
    feats = getTaskFeatureNames(task)
    imp = order(feats, decreasing = decreasing)
    names(imp) = feats
    imp
  }
)

# set up svm with rbf kernal
svm.lrn <- makeLearner("classif.ksvm",predict.type = "response")  

# wrap learner with filter
svm.lrn <- makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")

# define feature selection parameters 

ps.svm = makeParamSet(
  makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

)

# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)

svm.lrn <- makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
                           control = ctrl.svm)

# set up outer resampling
outer.svm <-  makeResampleDesc("CV", iters = 10, stratify = TRUE)

# run it...

parallelStartSocket(2)

run.svm <- resample(svm.lrn, iris.task, 
                    resampling = outer.svm, extract = getTuneResult)

parallelStop()

1 个答案:

答案 0 :(得分:1)

问题是makeFilter注册了S3方法,这些方法在单独的R进程中不可用。您有两个选项可以完成这项工作:或者只使用parallelStartMulticore(2)以便所有内容都在同一个R进程中运行,或者告诉parallelMap有关其他R进程中需要存在的部分。

后者有两个部分。首先,使用parallelLibrary("mlr")在任何地方加载mlr并将过滤器的定义拉出到可以使用parallelSource()加载的单独文件中。例如:

filter.R:

makeFilter(
  name = "nonsense.filter",
  desc = "Calculates scores according to alphabetical order of features",
  pkg = "mlr",
  supported.tasks = c("classif", "regr", "surv"),
  supported.features = c("numerics", "factors", "ordered"),
  fun = function(task, nselect, decreasing = TRUE, ...) {
    feats = getTaskFeatureNames(task)
    imp = order(feats, decreasing = decreasing)
    names(imp) = feats
    imp
  }
)

main.R:

library(parallelMap)
library(mlr)
library(kernlab)

parallelStartSocket(2)

parallelLibrary("mlr")
parallelSource("filter.R")

# set up svm with rbf kernal
svm.lrn = makeLearner("classif.ksvm",predict.type = "response")  

# wrap learner with filter
svm.lrn = makeFilterWrapper(svm.lrn, fw.method = "nonsense.filter")

# define feature selection parameters 

ps.svm = makeParamSet(
  makeDiscreteParam("fw.abs", values = seq(2, 3, 1)) 

)

# define inner search and evaluation strategy
ctrl.svm = makeTuneControlGrid()
inner.svm = makeResampleDesc("CV", iters = 5, stratify = TRUE)

svm.lrn = makeTuneWrapper(svm.lrn, resampling = inner.svm, par.set = ps.svm, 
                           control = ctrl.svm)

# set up outer resampling
outer.svm =  makeResampleDesc("CV", iters = 10, stratify = TRUE)

# run it...
run.svm = resample(svm.lrn, iris.task, resampling = outer.svm, extract = getTuneResult)

parallelStop()