参数调整后,在mlr中为学习者设置超参数

时间:2018-07-06 09:45:32

标签: r xgboost mlr

我正在使用mlr包在R中构建分类任务,以使用验证集来调整我正在使用的超参数,这些参数之一是使用特征选择基于重要性的变量百分比(chi.square方法)

lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
     makeDiscreteParam("booster",values = c("gbtree","dart")),
     makeDiscreteParam("nrounds", values = 1000, tunable = F),
     makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
     makeIntegerParam("max_depth",lower = 3L,upper = 10L),
     makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
     makeNumericParam("subsample",lower = 0.5,upper = 1),
     makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
     makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
rdesc = makeResampleDesc("CV", iters = 5)
ctrl <- makeTuneControlRandom(maxit = 1L)
res = tuneParams(lrn, task = valTask2016, resampling = rdesc, par.set = params, control = ctrl)

我不确定是否需要在这里进行5倍交叉验证,但是变量res为我提供了我需要的所有参数,包括fw.perc可以修剪我的变量选择重要性降序。

我的问题是,如何才能使用这些参数再次使用重采样(这次使用Subsampling),但是这次是在训练数据上?这就是我得到的:

rdesc = makeResampleDesc("Subsample", iters = 5, split = 0.8)
lrn = setHyperPars(makeLearner("classif.xgboost"), par.vals = res$x)
r = resample(lrn, trainTask2016, rdesc, measures = list(mmce, fpr, fnr, timetrain))

在这种情况下,valTask2016是我用于参数验证的分类任务。我使用createDummyFeatures进行XGBoost所需的一键编码。

这是我得到的错误:

  

setHyperPars2.Learner(learner,insert(par.vals,args))中的错误:     classif.xgboost:设置参数fw.perc时没有可用的描述对象!   您是说这些超参数之一吗:Booster eta alpha

1 个答案:

答案 0 :(得分:0)

我相信您会收到此错误的原因是,第二个学习者是一个“简单的” xgboost学习者,而不是像第一个学习者一样由过滤器包装的xgboost学习者(learnermakeFilterWrapper返回一个学习者)。

因此,您有两个选择:

  1. 您将在第二次培训中定义一个新的paramSet,在这里您仅“复制” res $ x中指向xgboost的部分,即没有fw.perc。
  2. 您用相同的过滤器包装第二个xgboost学习器

我希望这是有道理的。

编辑:这对于使用泰坦尼克号数据集的第二个选项很有效:

library(mlr)
library(dplyr)
library(titanic)

sample <- sample.int(n = nrow(titanic_train), size = floor(.7*nrow(titanic_train)), replace = F)
train <- titanic_train[sample, ] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))

lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")

params <- makeParamSet(
  makeDiscreteParam("booster",values = c("gbtree","dart")),
  makeDiscreteParam("nrounds", values = 1000, tunable = F),
  makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
  makeIntegerParam("max_depth",lower = 3L,upper = 10L),
  makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
  makeNumericParam("subsample",lower = 0.5,upper = 1),
  makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
  makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))

classif.task <- mlr::makeClassifTask(data = train,
                                 target = "Survived",
                                 positive = "1")

rdesc = makeResampleDesc("CV", iters = 3)

ctrl <- makeTuneControlRandom(maxit = 2L)

res = tuneParams(lrn, task = classif.task, resampling = rdesc, par.set = params, control = ctrl)

##########################

test <- titanic_train[-sample,] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))

lrn2 = setHyperPars(makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared"), par.vals = res$x)

classif.task2 <- mlr::makeClassifTask(data = test,
                                 target = "Survived",
                                 positive = "1")

rdesc = makeResampleDesc("CV", iters = 3)
r = resample(learner = lrn2, task = classif.task2, resampling = rdesc, show.info = T, models = TRUE)