在插入符号列表中指定结果变量的正类()

时间:2017-07-26 16:48:52

标签: r r-caret

我想知道是否有办法在插入符train()函数中指定哪个类的结果变量是正数。一个最小的例子:

# Settings
ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, summaryFunction = twoClassSummary, classProbs = TRUE)

# Data
data <- mtcars %>% mutate(am = factor(am, levels = c(0,1), labels = c("automatic", "manual"), ordered = T))

# Train
set.seed(123)
model1 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Data (factor ordering switched)
data <- mtcars %>% mutate(am = factor(am, levels = c(1,0), labels = c("manual", "automatic"), ordered = T))

# Train
set.seed(123)
model2 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Specifity and Sensitivity is switched
model1
model2

如果您运行代码,您会注意到特异性和敏感度指标是&#34;已切换&#34;在两个模型中。看起来train()函数将因子结果变量的第一级作为积极结果。有没有办法在函数本身中指定一个正类,所以无论结果因子排序如何,我都会得到相同的结果?我尝试添加positive = "manual",但这会导致错误。

2 个答案:

答案 0 :(得分:1)

我相信@Johannes是过度设计一个简单流程的例子。

只需还原因子的顺序:

   df$target <- factor(df$target, levels=rev(levels(df$target)))

答案 1 :(得分:0)

问题不在函数train()中,而在函数twoClassSummary中,它看起来像这样:

function (data, lev = NULL, model = NULL) 
{
  lvls <- levels(data$obs)

  [...]    

  out <- c(rocAUC, 
           sensitivity(data[, "pred"], data[, "obs"], 
             lev[1]),  # Hard coded positive class
           specificity(data[, "pred"], data[, "obs"], 
             lev[2])) # Hard coded negative class
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

这是匹配的较小包装,所以我们可以修复它!将它们传递到sensitivity()specificity()的级别顺序在此处进行了硬编码。要解决此问题,您可以基于twoClassSummary()编写自己的摘要函数。

sensitivity()specificity()分别采用positivenegative级别名称(次优设计选择)。因此,我们将这两个参数包含在自定义函数中。 再往下,我们将这些参数传递给相应的函数以解决问题。

customTwoClassSummary <- function(data, lev = NULL, model = NULL, positive = NULL, negative=NULL) 
{
  lvls <- levels(data$obs)
  if (length(lvls) > 2) 
    stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))
  caret:::requireNamespaceQuietStop("ModelMetrics")
  if (!all(levels(data[, "pred"]) == lvls)) 
    stop("levels of observed and predicted data do not match")
  rocAUC <- ModelMetrics::auc(ifelse(data$obs == lev[2], 0, 
                                     1), data[, lvls[1]])
  out <- c(rocAUC, 
           # Only change happens here!
           sensitivity(data[, "pred"], data[, "obs"], positive=positive), 
           specificity(data[, "pred"], data[, "obs"], negative=negative))
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

但是如何在不更改程序包内更多代码的情况下指定这些选项?默认情况下,caret不会将选项传递给摘要功能。我们在对trainControl()的调用中将函数包装为匿名函数:

ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, 
                     # This is a trick how to fix arguments for a function call
                     summaryFunction = function(...) customTwoClassSummary(..., 
                                       positive = "manual", negative="automatic"), 
                     classProbs = TRUE)

...参数确保将caret传递给匿名函数的所有其他参数传递给customTwoClassSummary()