我正在执行逻辑回归,当我预测时,我的数据中出现了关于 NA 的错误。我尝试了不同的方法,但仍然出现相同的错误。这是我的代码:
Modelo_lg <- glm(Default ~ TIPO_ID + Añomes + NOMBRE_PRO + Saldo_Corte + Provisión +
+ Calificación + Segmentación + Calif_R, data = ME, family = "binomial")
summary(Modelo_lg)
Call:
glm(formula = Default ~ TIPO_ID + Añomes + NOMBRE_PRO + Saldo_Corte +
Provisión + +Calificación + Segmentación + Calif_R, family = "binomial",
data = ME, na.action = na.omit)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6762 -0.0407 -0.0129 -0.0037 4.4010
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.352e+02 8.849e+02 -0.379 0.705
TIPO_ID -1.014e-01 6.425e-02 -1.578 0.115
Añomes 1.559e-03 3.295e-04 4.731 2.24e-06 ***
NOMBRE_PROGB-CORPORATIVO M/E 8.334e-01 1.870e-01 4.457 8.33e-06 ***
NOMBRE_PROGB-PRESTAMOS REDES SIN GTIA 1.947e+00 1.293e-01 15.066 < 2e-16 ***
Saldo_Corte 6.447e-12 1.385e-11 0.465 0.642
Provisión -1.478e-11 2.201e-11 -0.671 0.502
CalificaciónB 2.992e+00 1.753e-01 17.070 < 2e-16 ***
CalificaciónC 6.624e+00 1.428e-01 46.395 < 2e-16 ***
CalificaciónD 8.702e+00 1.586e-01 54.865 < 2e-16 ***
CalificaciónE 1.003e+01 2.210e-01 45.368 < 2e-16 ***
SegmentaciónColombia_Corp -3.160e+00 3.301e-01 -9.575 < 2e-16 ***
SegmentaciónColombia_Emp -5.245e+00 3.562e-01 -14.723 < 2e-16 ***
SegmentaciónColombia_Miami -1.603e+01 1.030e+03 -0.016 0.988
SegmentaciónColombia_Pyme -2.481e+00 3.298e-01 -7.524 5.31e-14 ***
Calif_RR10 1.338e+01 8.824e+02 0.015 0.988
Calif_RR2 4.730e-01 1.012e+03 0.000 1.000
Calif_RR3 1.236e+01 8.824e+02 0.014 0.989
Calif_RR4 4.001e-01 9.229e+02 0.000 1.000
Calif_RR5 1.426e+01 8.824e+02 0.016 0.987
Calif_RR6 1.526e+01 8.824e+02 0.017 0.986
Calif_RR7 1.731e+01 8.824e+02 0.020 0.984
Calif_RR8 1.684e+01 8.824e+02 0.019 0.985
Calif_RR9 1.608e+01 8.824e+02 0.018 0.985
Calif_RSin R 1.470e+01 8.824e+02 0.017 0.987
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Null deviance: 25839.9 on 259715 degrees of freedom
Residual deviance: 5832.4 on 259691 degrees of freedom
(4 observations deleted due to missingness)
AIC: 5882.4
Number of Fisher Scoring iterations: 21
####Dividing the sample####
n<- dim(ME)[1]
set.seed(1234) # random sample
train <- sample(1:n , 0.7*n)
ME.test <- ME[-train,]
ME.train <- ME[train,]
ytrain <- ME$Default[train]
ytest <- ME$Default[-train]
###Predict
pred1<- predict.glm(Modelo_lg, newdata = ME.test, type="response")
result1<- table(ytest, floor(pred1+0.5))
result1
ytest 0 1
0 77131 99
1 161 524
error1<- sum(result1[1,2], result1[2,1])/sum(result1)
error1
ytest 0 1
0 77131 99
1 161 524
library(ROCR)
pred = ROCR::prediction(pred1,ytest)
perf <- performance(pred, "tpr", "fpr")
错误:“预测”包含不适用。
我已经尝试在我的 glm 模型和 predict.glm 中放置:na.action = na.exclude(此处建议 How to Use `predict()` without errors in a model when you have missing data?)。如果我把它放在 predict.glm 中,那么我会得到另一个错误:所有参数必须具有相同的长度。
希望您能指导我,谢谢!