在R中,glm的摘要提供了许多有用的信息。但我没有找到误分类率/准确度指标。每当我想要这些指标时,我需要重新运行预测并与基础事实标签进行比较。有没有更好的方法?例如,从glm结果中提取?
> summary(glm(am~wt,mtcars,family = "binomial"))
Call:
glm(formula = am ~ wt, family = "binomial", data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.11400 -0.53738 -0.08811 0.26055 2.19931
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 12.040 4.510 2.670 0.00759 **
wt -4.024 1.436 -2.801 0.00509 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 43.230 on 31 degrees of freedom
Residual deviance: 19.176 on 30 degrees of freedom
AIC: 23.176
Number of Fisher Scoring iterations: 6
答案 0 :(得分:1)
以下是评估模型预测能力的一些提示。
set.seed(1234)
# Generate a training and a testing set
idx <- sample(1:nrow(mtcars), size=round(0.5*nrow(mtcars)))
train <- mtcars[idx,]
test <- mtcars[-idx,]
# Fit model and evaluate prediction probabilities
glmfit <- glm(am ~ wt, train, family = "binomial")
test$pred <- predict(glmfit, type="response", newdata=test)
# Calculate the area under the ROC curve
library(pROC)
roc.curve <- roc(test$am, test$pred, ci=T)
# Plot the ROC curve
plot(roc.curve)
# Calculates a cross-tabulation of observed and predicted classes
# with associated statistics
library(caret)
threshold <- 0.5
confusionMatrix(factor(test$pred>threshold), factor(test$am==1), positive="TRUE")
confusionMatrix
命令的输出为:
Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
FALSE 8 0
TRUE 3 5
Accuracy : 0.8125
95% CI : (0.5435, 0.9595)
No Information Rate : 0.6875
P-Value [Acc > NIR] : 0.2134
Kappa : 0.625
Mcnemar's Test P-Value : 0.2482
Sensitivity : 1.0000
Specificity : 0.7273
Pos Pred Value : 0.6250
Neg Pred Value : 1.0000
Prevalence : 0.3125
Detection Rate : 0.3125
Detection Prevalence : 0.5000
Balanced Accuracy : 0.8636
'Positive' Class : TRUE
答案 1 :(得分:0)
为了准确性,我已经编写了此函数。您可以根据上下文确定阈值。
calc_accuracy <- function(stat_model){
# Capturing the name of the target variable and data from the stat_model
threshold <- 0.5
target_name <- colnames(stat_model$model)[[1]]
data <- stat_model$data
predict <- stats::predict(stat_model, type = 'response')
confusion_matrix <- table(data[[as_name(enquo(target_name))]],
predict > threshold)
if (ncol(confusion_matrix)==2 ){
accuracy <- (confusion_matrix[1,1] + confusion_matrix[2,2]) /
sum(confusion_matrix)}
else{accuracy <- 0}
round(accuracy,2)
}