Question

在R中，glm的摘要提供了许多有用的信息。但我没有找到误分类率/准确度指标。每当我想要这些指标时，我需要重新运行预测并与基础事实标签进行比较。有没有更好的方法？例如，从glm结果中提取？

> summary(glm(am~wt,mtcars,family = "binomial"))

Call:
glm(formula = am ~ wt, family = "binomial", data = mtcars)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.11400  -0.53738  -0.08811   0.26055   2.19931  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)   12.040      4.510   2.670  0.00759 **
wt            -4.024      1.436  -2.801  0.00509 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.230  on 31  degrees of freedom
Residual deviance: 19.176  on 30  degrees of freedom
AIC: 23.176

Number of Fisher Scoring iterations: 6

Answer 1

以下是评估模型预测能力的一些提示。

set.seed(1234)
# Generate a training and a testing set
idx <- sample(1:nrow(mtcars), size=round(0.5*nrow(mtcars)))
train <- mtcars[idx,]
test <- mtcars[-idx,]

# Fit model and evaluate prediction probabilities
glmfit <- glm(am ~ wt, train, family = "binomial")
test$pred <- predict(glmfit, type="response", newdata=test)

# Calculate the area under the ROC curve
library(pROC)
roc.curve <- roc(test$am, test$pred, ci=T)

# Plot the ROC curve
plot(roc.curve)

# Calculates a cross-tabulation of observed and predicted classes 
# with associated statistics 
library(caret)
threshold <- 0.5
confusionMatrix(factor(test$pred>threshold), factor(test$am==1), positive="TRUE")

confusionMatrix命令的输出为：

Confusion Matrix and Statistics

          Reference
Prediction FALSE TRUE
     FALSE     8    0
     TRUE      3    5

               Accuracy : 0.8125          
                 95% CI : (0.5435, 0.9595)
    No Information Rate : 0.6875          
    P-Value [Acc > NIR] : 0.2134          

                  Kappa : 0.625           
 Mcnemar's Test P-Value : 0.2482          

            Sensitivity : 1.0000          
            Specificity : 0.7273          
         Pos Pred Value : 0.6250          
         Neg Pred Value : 1.0000          
             Prevalence : 0.3125          
         Detection Rate : 0.3125          
   Detection Prevalence : 0.5000          
      Balanced Accuracy : 0.8636          

       'Positive' Class : TRUE

Answer 2

为了准确性，我已经编写了此函数。您可以根据上下文确定阈值。

calc_accuracy <- function(stat_model){

  # Capturing the name of the target variable and data from the stat_model

  threshold   <-  0.5
  target_name <-  colnames(stat_model$model)[[1]]
  data        <-  stat_model$data

  predict          <- stats::predict(stat_model, type = 'response')
  confusion_matrix <- table(data[[as_name(enquo(target_name))]], 
                            predict > threshold)

  if (ncol(confusion_matrix)==2 ){
  accuracy  <- (confusion_matrix[1,1] + confusion_matrix[2,2]) / 
    sum(confusion_matrix)}

  else{accuracy <- 0}
  round(accuracy,2)

}

我可以从glm结果中获得分类准确度和Cohens'Kappa吗？

2 个答案: