mlr基准测试中针对分类问题的auc给出错误(要求预测类型为:“概率”)

时间:2018-10-22 15:50:11

标签: r resampling auc mlr

我正在使用mlr软件包进行基准分析,并希望使用auc作为我的绩效指标。我已指定predict.type = "prob",但仍收到以下错误消息:

0001: Error in FUN(X[[i]], ...) : 
  Measure auc requires predict type to be: 'prob'!

我的代码:

#define measures
meas <- list(acc, mlr::auc, brier)

##random forest
p_length <- ncol(training_complete) - 1
lrn_RF = makeLearner("classif.randomForest", predict.type = "prob", par.vals = list("ntree" = 500L))
wcw_lrn_RF = makeWeightedClassesWrapper(lrn_RF, wcw.weight = 0.10) #weighted class wrapper
parsRF = makeParamSet(
  makeIntegerParam("mtry", lower = 1 , upper = floor(0.4*p_length)),
 makeIntegerParam("nodesize", lower = 10, upper = 50))
tuneRF = makeTuneControlRandom(maxit = 100)
inner = makeResampleDesc("CV", iters = 2)
learnerRF = makeTuneWrapper(lrn_RF, resampling = inner, meas, par.set = parsRF, control = tuneRF, show.info = FALSE)

##extreme gradient boosting
lrn_xgboost <- makeLearner(
  "classif.xgboost",
  predict.type = "prob", #before was response
  par.vals = list(objective = "binary:logistic", eval_metric = "error", nrounds = 200)) 
getParamSet("classif.xgboost")
pars_xgboost = makeParamSet(
  makeIntegerParam("nrounds", lower = 100, upper = 500),
  makeIntegerParam("max_depth", lower = 1, upper = 10),
  makeNumericParam("eta", lower = .1, upper = .5),
  makeNumericParam("lambda", lower = -1, upper = 0, trafo = function(x) 10^x))
tunexgboost = makeTuneControlRandom(maxit = 50) 
inner = makeResampleDesc("CV", iters = 2)
learnerxgboost = makeTuneWrapper(lrn_xgboost, resampling = inner, meas, par.set = pars_xgboost,control = tunexgboost, show.info = FALSE)


##Benchmarking via outer resampling loop

#Learners to be compared
lrns = list(
  makeLearner("classif.featureless"), 
  learnerRF,
  learnerxgboost
)

#outer resampling strategy
rdesc = makeResampleDesc("CV", iters = 5) 

library(methods)
library(parallel)
library(parallelMap)

set.seed(123, "L'Ecuyer") 

parallelStartSocket(parallel::detectCores(), level = "mlr.resample")

churn_benchmarking <- benchmark(learners = lrns,
                                tasks = trainTask,
                                resamplings = rdesc,
                                models = FALSE,
                                measures = meas)

parallelStop()

任何提示都值得赞赏!

1 个答案:

答案 0 :(得分:1)

我可以看到一个问题。您的无特征学习者没有提供概率。

改为编写 select cust_email,max(login_time) as max_login_time from customer left join login on customer.cust_id=login.cust_id GROUP by cust_email