R - MLR - 分类器校准 - 基准测试结果

时间:2017-01-17 02:04:42

标签: mlr

我已经针对分类问题运行了嵌套交叉验证(调优+性能测量)的基准实验,并希望创建校准图表。

如果我将基准测试结果对象传递给generateCalibrationData,那么plotCalibration会做什么?是平均值吗?如果是这样的话?

根据generateThreshVsPerfData的ROC曲线,使用aggregate = FALSE选项来理解折叠的可变性是否有意义?

为回应@ Zach对可复制示例的请求,我(OP)编辑我原来的帖子如下:

编辑:可重复的示例

document.getElementById("albatross").addEventListener("mouseover", fadeRest());

产生以下内容:

Aggregared Calibration Plot

尝试取消聚合导致:

# Practice Data

library("mlr")
library("ROCR")
library(mlbench)

data(BreastCancer)
dim(BreastCancer)
levels(BreastCancer$Class)
head(BreastCancer)

BreastCancer <- BreastCancer[, -c(1, 6, 7)]
BreastCancer$Cl.thickness <- as.factor(unclass(BreastCancer$Cl.thickness))
BreastCancer$Cell.size <- as.factor(unclass(BreastCancer$Cell.size))
BreastCancer$Cell.shape <- as.factor(unclass(BreastCancer$Cell.shape))
BreastCancer$Marg.adhesion <- as.factor(unclass(BreastCancer$Marg.adhesion))
head(BreastCancer)

# Define Nested Cross-Validation Strategy

cv.inner <- makeResampleDesc("CV", iters = 2, stratify = TRUE)
cv.outer <- makeResampleDesc("CV", iters = 6, stratify = TRUE)

# Define Performance Measures

perf.measures <- list(auc, mmce)

# Create Task

bc.task <- makeClassifTask(id = "bc",
                           data = BreastCancer, 
                           target = "Class", 
                           positive = "malignant")

# Create Tuned KSVM Learner

ksvm <- makeLearner("classif.ksvm", 
                    predict.type = "prob")

ksvm.ps <- makeParamSet(makeDiscreteParam("C", values = 2^(-2:2)),
                        makeDiscreteParam("sigma", values = 2^(-2:2)))

ksvm.ctrl <- makeTuneControlGrid()

ksvm.lrn = makeTuneWrapper(ksvm, 
                           resampling = cv.inner,
                           measures = perf.measures,
                           par.set = ksvm.ps, 
                           control = ksvm.ctrl, 
                           show.info = FALSE)

# Create Tuned Random Forest Learner

rf <- makeLearner("classif.randomForest", 
                  predict.type = "prob", 
                  fix.factors.prediction = TRUE)

rf.ps <- makeParamSet(makeDiscreteParam("mtry", values = c(2, 3, 5)))

rf.ctrl <- makeTuneControlGrid()

rf.lrn = makeTuneWrapper(rf, 
                         resampling = cv.inner,
                         measures = perf.measures,
                         par.set = rf.ps, 
                         control = rf.ctrl, 
                         show.info = FALSE)

# Run Cross-Validation Experiments

bc.lrns = list(ksvm.lrn, rf.lrn)

bc.bmr <- benchmark(learners = bc.lrns, 
                    tasks = bc.task, 
                    resampling = cv.outer, 
                    measures = perf.measures, 
                    show.info = FALSE)

# Calibration Charts

bc.cal <- generateCalibrationData(bc.bmr)
plotCalibration(bc.cal)

1 个答案:

答案 0 :(得分:0)

没有plotCalibration没有进行任何平均,但它可以顺利绘制。

如果您在基准测试结果对象上调用generateCalibrationData,它会将重新采样预测的每次迭代视为可交换,并计算该区域的所有重采样预测的校准。

是的,有一个生成非聚合校准数据对象并能够绘制它的选项可能是有意义的。欢迎你在GitHub上打开一个问题,但这在我的优先列表TBH上会很低。