Question

我正在尝试使用嵌套的交叉验证进行超参数调整。这是我给两个学习者lrn1和lrn2的内在循环：

inner = makeResampleDesc("CV", iters = 3L)

tune_lrn1 <- makeTuneWrapper(lrn1, resampling = inner, par.set = ps, control = ctrl)

tune_lrn2 <- makeTuneWrapper(lrn2, resampling = inner, par.set = ps, control = ctrl)

在实例化“内部”之前是否有任何方法可以设置随机种子的固定值，以便两个学习者始终使用完全相同的数据分区进行超参数评估？

Answer 1

您可以做两件事，但它们可能无法完全满足您的需求。

修复给定任务的重采样

...或至少给定的n。在此示例中，我们使用任务iris.task。

inner_fixed = makeResampleInstance(inner, iris.task)
tune_lrn1 <- makeTuneWrapper(lrn1, resampling = inner_fixed, par.set = ps, control = ctrl)
tune_lrn2 <- makeTuneWrapper(lrn2, resampling = inner_fixed, par.set = ps, control = ctrl)

如果要将其应用于多个任务，则必须以编程方式解决。

设置种子可能会失败！

以下设置已经是默认设置

ctrl = makeTuneControl*(same.resampling.instance = TRUE, ...)

这意味着，所有调整评估均在同一列车/测试段中进行评估。换句话说：makeResampleInstance在tune()的开始处被调用。现在我们可以使用@ pat-s答案，因为它在某些学习者的培训期间使用了RNG，因此它并不总是有效，因此正在进行的培训/测试拆分将“发散”：

library(mlr)
inner = makeResampleDesc("CV", iters = 3L)
task = iris.task
lrn1 = makeLearner("classif.rpart")
lrn2 = makeLearner("classif.svm")
ctrl = makeTuneControlRandom(same.resampling.instance = TRUE, budget = 4)

library(mlrHyperopt)
ps1 = getDefaultParConfig(lrn1)$par.set
ps2 = getDefaultParConfig(lrn2)$par.set

tune_lrn1 = makeTuneWrapper(lrn1, resampling = inner, par.set = ps1, control = ctrl)
tune_lrn2 = makeTuneWrapper(lrn2, resampling = inner, par.set = ps2, control = ctrl)
set.seed(1)
r1 = resample(tune_lrn1, resampling = cv10, task = iris.task, models = TRUE)
set.seed(1)
r2 = resample(tune_lrn2, resampling = cv10, task = iris.task, models = TRUE)

sapply(1:10, function(i) {
  identical(r2$models[[i]]$learner.model$opt.result$resampling$train.inds, r1$models[[i]]$learner.model$opt.result$resampling$train.inds)  
})

# [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

如何为嵌套超参数调整的内部循环设置固定的随机种子？

1 个答案:

修复给定任务的重采样

设置种子可能会失败！