Question

关于因子响应变量的glm()函数的文档说明

第一级表示失败，其他所有成功。

在使用train()时，我假设插入符号的glm()函数调用了method = 'glm'，因此同样适用。

因此，为了产生与其他模型一致的可解释模型（，即系数对应于success事件），我必须遵循这一惯例。

问题在于，即使glm（）和插入符train()函数将第二级因子视为成功，插入符resamples函数（和$resample变量）仍然将第一个级别视为success / positive，因此如果我想使用sensitivity，则specificity和resamples()与它们应该是相反的与其他模型进行比较..

示例：

install.packages('ISLR')
library('ISLR')
summary(Default)
levels(Default$default) # 'yes' is second level on factor
glm_model <- glm(default ~ ., family = "binomial", data = Default)
summary(glm_model)

train_control <- trainControl(
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    method = 'repeatedcv',
    number = 5,
    repeats = 5,
    verboseIter = FALSE,
    savePredictions = TRUE)
set.seed(123)
caret_model <- train(default ~ ., data = Default, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model # shows Sens of ~0.99 and Spec of ~0.32
caret_model$resample # shows same, but for each fold/repeat; by now, resamples are already the opposite of what they should be, which will propagate to resamples() method, no way to specify positive/success class in train()?

confusionMatrix(data = predict(caret_model, Default), reference = Default$default, positive = 'Yes') # once I set 'Yes' as positive class, the true sensitivity and specificity are calculated, but no way to do this for resamples()?

我可以使用confusionMatrix在positive = 'Yes'中看到正确的sens / spec，但resamples()的解决方案是什么，以便我可以准确地将其与其他模型进行比较？

Answer 1

以下内容将反转灵敏度：

temp <- Default
temp$default <- fct_relevel(temp$default, "Yes")
levels(temp$default)
levels(Default$default)

caret_model <- train(relevel(default, ref = "Yes") ~ ., data = temp, method = 'glm', metric='ROC', preProc=c('nzv', 'center', 'scale', 'knnImpute'), trControl = train_control)
summary(caret_model)
caret_model

基于《 应用的预测建模》一书的第272页；

glm（）函数为第二个因素的概率建模级别，因此函数relevel（）用于暂时反转因素水平。

Caret的火车（）＆amp; resamples（）反转GLM的敏感性/特异性

1 个答案: