Caret的火车()& resamples()反转GLM的敏感性/特异性

时间:2017-06-06 05:52:13

标签: r r-caret glm resampling

关于因子响应变量的glm()函数的文档说明

  

第一级表示失败,其他所有成功。

在使用train()时,我假设插入符号的glm()函数调用了method = 'glm',因此同样适用。

因此,为了产生与其他模型一致的可解释模型(,即系数对应于success事件),我必须遵循这一惯例。

问题在于,即使glm()和插入符train()函数将第二级因子视为成功,插入符resamples函数(和$resample变量)仍然将第一个级别视为success / positive,因此如果我想使用sensitivity,则specificityresamples()与它们应该是相反的与其他模型进行比较..

示例:

install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)

train_control <- trainControl(
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    method = 'repeatedcv',
    number = 5,
    repeats = 5,
    verboseIter = FALSE,
    savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

我可以使用confusionMatrixpositive = 'Yes'中看到正确的sens / spec,但resamples()的解决方案是什么,以便我可以准确地将其与其他模型进行比较?

1 个答案:

答案 0 :(得分:0)

以下内容将反转灵敏度:

temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)

caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model 

基于《 应用的预测建模》一书的第272页;

glm()函数为第二个因素的概率建模 级别,因此函数relevel()用于暂时反转 因素水平。