基于F1度量的Caret训练模型

时间:2017-12-14 19:27:00

标签: r r-caret

我正在尝试将随机森林模型拟合到我的数据集中,我想根据F1分数选择最佳模型。我看到一篇帖子here描述了必要的代码。我试图复制代码,但我收到了错误

  

“{:任务1失败 - ”中的错误找不到功能“F1_Score”

当我运行火车功能时。 (仅供参考我想要预测的变量(“通过”)是两类因素“失败”和“通过”)

参见下面的代码:

library(MLmetrics)
library(caret)
library(doSNOW)

f1 <- function(data, lev = NULL, model = NULL) {
  f1_val <- F1_Score(y_pred = data$pred, y_true = data$obs, positive = lev[1])
  c(F1 = f1_val)
}



train.control <- trainControl(method = "repeatedcv",
                              number = 10,
                              repeats = 3,
                              classProbs = TRUE,
                              summaryFunction = f1,
                              search = "grid")


tune.grid <- expand.grid(.mtry = seq(from = 1, to = 10, by = 1))


cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)
random.forest.orig <- train(pass ~ manufacturer+meter.type+premise+size+age+avg.winter+totalizer, 
                     data = meter.train,
                     method = "rf",
                     tuneGrid = tune.grid,
                     metric = "F1",
                     weights = model_weights,
                     trControl = train.control)
stopCluster(cl)

2 个答案:

答案 0 :(得分:1)

我没有使用MLmetrics库重写了f1函数,它似乎有效。请参阅下文,了解创建f1分数的工作代码:

f1 <- function (data, lev = NULL, model = NULL) {
  precision <- posPredValue(data$pred, data$obs, positive = "pass")
  recall  <- sensitivity(data$pred, data$obs, postive = "pass")
  f1_val <- (2 * precision * recall) / (precision + recall)
  names(f1_val) <- c("F1")
  f1_val
} 

train.control <- trainControl(method = "repeatedcv",
                          number = 10,
                          repeats = 3,
                          classProbs = TRUE,
                          #sampling = "smote",
                          summaryFunction = f1,
                          search = "grid")


tune.grid <- expand.grid(.mtry = seq(from = 1, to = 10, by = 1))


cl <- makeCluster(3, type = "SOCK")
registerDoSNOW(cl)
random.forest.orig <- train(pass ~ manufacturer+meter.type+premise+size+age+avg.winter+totalizer, 
                 data = meter.train,
                 method = "rf",
                 tuneGrid = tune.grid,
                 metric = "F1",
                 trControl = train.control)
stopCluster(cl)

答案 1 :(得分:1)

我有完全相同的错误。当我使用 MLmetrics 包中的其他函数(例如 Precision 函数)时,也会发生该错误。

我通过使用双冒号 F1_Score 访问 :: 函数解决了这个问题。

f1 <- function(data, lev = NULL, model = NULL) {
        f1_val <- MLmetrics::F1_Score(y_pred = data$pred,
                                      y_true = data$obs,
                                      positive = lev[1])
        c(F1 = f1_val)
}

使用 MLmetrics::F1_Score,您可以明确地使用 F1_Score 包中的 MLmetrics

MLmetrics 包的一个优点是它的函数可以处理超过 2 个级别的变量。