使用mlr包对R进行平均模型预测

时间:2017-07-10 21:29:55

标签: r h2o mlr

有没有办法将mlr中不同模型的多个预测组合成单个平均预测,以便可以用它来计算绩效指标等?

library(mlr)
data(iris)
iris2 <- iris
iris2$Species <- ifelse(iris$Species=="setosa", "ja", "nein")
task = makeClassifTask(data = iris2, target = "Species")
lrn = makeLearner("classif.h2o.deeplearning", predict.type="prob")
model1 = train(lrn, task)
model2 = train(lrn, task)
pred1 = predict(model1, newdata=iris2)
pred2 = predict(model2, newdata=iris2)
performance(pred1, measures = auc)
g = generateThreshVsPerfData(pred1)
plotThreshVsPerf(g)

显示我的意思的解决办法可能是

pred_avg = pred1
pred_avg$data[,c("prob.ja","prob.nein")] = (pred1$data[,c("prob.ja","prob.nein")] + 
                                              pred2$data[,c("prob.ja","prob.nein")])/2
performance(pred_avg, measures = auc)
g_avg = generateThreshVsPerfData(pred_avg)
plotThreshVsPerf(g_avg)

有没有办法在没有解决方法的情况下执行此操作,这种解决方法是否会产生任何不必要的副作用?

1 个答案:

答案 0 :(得分:1)

听起来你正在寻找一个stacking学习者,这是mlr执行合奏的方法。

来自文档

 # Regression
  data(BostonHousing, package = "mlbench")
  tsk = makeRegrTask(data = BostonHousing, target = "medv")
  base = c("regr.rpart", "regr.svm")
  lrns = lapply(base, makeLearner)
  m = makeStackedLearner(base.learners = lrns,
    predict.type = "response", method = "average")
  tmp = train(m, tsk)
  res = predict(tmp, tsk)
# Prediction: 506 observations
# predict.type: response
# threshold: 
# time: 0.02
#   id truth response
# 1  1  24.0 27.33742
# 2  2  21.6 22.08853
# 3  3  34.7 33.52007
# 4  4  33.4 32.49923
# 5  5  36.2 32.67973
# 6  6  28.7 22.99323
# ... (506 rows, 3 cols)

performance(res, rmse)
#     rmse 
# 3.138981