在拟合逻辑回归(glm)模型的同时,我得到了“glm.fit中的错误”

时间:2018-02-02 00:55:30

标签: r glm

我正在使用glm()拟合逻辑回归模型。

问题在于最后两个变量:change_per_callchange_per_min

我的数据结构是:

<pre>
   'data.frame':    66115 obs. of  54 variables:
         $ Customer_ID    : num  1030539 1025498 1053921 1005718 1046164 ...
         $ actvsubs       : num  1 1 1 2 1 1 1 2 2 1 ...
         $ adjrev         : num  1017 1784 821 1398 404 ...
         $ adjmou         : num  5690 9367 5161 7294 1975 ...
         $ avgmou         : num  284 468 469 203 152 ...
         $ avgrev         : num  50.8 89.2 74.6 38.9 31.1 ...
         $ avgqty         : num  92 277.2 116.8 59.5 65.8 ...
         $ age1           : num  52 60 30 56 0 20 52 34 32 0 ...
         $ age2           : num  34 66 28 48 0 0 0 0 0 0 ...
         $ blck_dat_Mean  : num  0 0 0 0 0 0 0 0 0 0 ...
         $ callwait_Mean  : num  0 4 0.667 0 0.333 ...
         $ callwait_Range : num  0 2 2 0 1 0 1 0 0 2 ...
         $ change_mou     : num  -97.5 129 339.8 86 -63.2 ...
         $ children       : num  1 1 0 1 0 0 0 0 0 0 ...
         $ comp_vce_Mean  : num  43.3 304.3 154.7 7 51 ...
         $ custcare_Mean  : num  0 0 1 0 0 ...
         $ datovr_Mean    : num  0 0 0 0 0 0 0 0 0 0 ...
         $ da_Mean        : num  0.743 0 0.495 0 0 ...
         $ drop_blk_Mean  : num  5.667 26.333 6.333 0.333 3.333 ...
         $ drop_dat_Mean  : num  0 0 0 0 0 0 0 0 0 0 ...
         $ drop_vce_Mean  : num  2 21 4.333 0.333 2 ...
         $ eqpdays        : num  633 178 97 748 427 474 279 332 362 885 ...
         $ forgntvl       : num  0 0 0 0 0 0 0 0 0 0 ...
         $ income         : num  6 6 8 9 5.79 ...
         $ months         : num  21 22 12 38 14 16 9 23 12 30 ...
         $ mou_Mean       : num  166 590 695 76 125 ...
         $ ovrmou_Mean    : num  10 253 91 0 0 ...
         $ ovrrev_Mean    : num  3.5 88.6 27.3 0 0 ...
         $ retdays        : num  235 235 235 235 235 ...
         $ rev_Range      : num  14.99 75.95 110.19 0 0.54 ...
         $ roam_Mean      : num  0.292 0 0 0 0 ...
         $ totcalls       : num  1843 5586 1285 2181 865 ...
         $ totrev         : num  1049 1835 911 1449 464 ...
         $ wrkwoman       : num  0 1 0 1 0 0 0 0 0 0 ...
         $ asl_flag       : num  0 0 1 0 0 0 0 0 0 0 ...
         $ dwlltype       : num  0 0 0 0 0 1 1 0 0 0 ...
         $ refurb_new     : num  1 0 1 1 1 1 1 1 1 1 ...
         $ mtrcycle       : num  0 0 0 0 0 0 0 0 0 0 ...
         $ truck          : num  0 1 0 0 0 0 0 0 0 0 ...
         $ hnd_price      : num  150 30 130 150 150 ...
         $ models         : num  1 2 2 2 1 1 1 2 1 1 ...
         $ numbcars       : num  2 1 1 2 1.57 ...
         $ churn          : num  1 1 1 1 1 1 1 1 1 1 ...
         $ urban          : num  1 0 0 0 0 0 1 0 0 1 ...
         $ city           : num  0 1 0 0 0 0 0 0 0 0 ...
         $ suburban       : num  0 0 1 1 0 1 0 0 1 0 ...
         $ town           : num  0 0 0 0 1 0 0 0 0 0 ...
         $ married        : num  1 1 0 1 0 0 0 1 0 0 ...
         $ married_A      : num  0 0 1 0 0 0 0 0 0 0 ...
         $ married_B      : num  0 0 0 0 0 0 0 0 0 1 ...
         $ married_S      : num  0 0 0 0 0 1 1 0 0 0 ...
         $ webcap         : num  1 1 1 1 1 1 1 1 1 0 ...
         <strong>$ change_per_call: num  0.569 0.328 0.709 0.664 0.537 ...
         $ change_per_min : num  0.179 0.19 0.159 0.192 0.205 ...</strong>
</pre>

当我在没有change_per_callchange_per_min变量的情况下拟合模型时,我得到以下输出:

 model5 <- glm(formula =churn~ actvsubs+ adjrev+ 
                  #adjmou+ 
                  avgmou+ 
                  #avgrev+
                  #avgqty+
                  age1+
                  #age2+
                  #blck_dat_Mean+
                  #callwait_Mean+ 
                  #callwait_Range+ 
                  change_mou+ children+comp_vce_Mean+custcare_Mean+
                  #datovr_Mean+
                  #da_Mean+ 
                  #drop_blk_Mean+
                  #drop_dat_Mean+
                  drop_vce_Mean+eqpdays+
                  #forgntvl+
                  #income+
                  months+mou_Mean+
                  #ovrmou_Mean+
                  ovrrev_Mean+
                  retdays+rev_Range+roam_Mean+
                  #totcalls+
                  #totrev+
                  wrkwoman+asl_flag+dwlltype+refurb_new+
                  #mtrcycle+truck+
                  hnd_price+models
                #+numbcars
                ,data = train[,-1], family = "binomial")

    summary(model5)

Call:
glm(formula = churn ~ actvsubs + adjrev + avgmou + age1 + change_mou + 
    children + comp_vce_Mean + custcare_Mean + drop_vce_Mean + 
    eqpdays + months + mou_Mean + ovrrev_Mean + retdays + rev_Range + 
    roam_Mean + wrkwoman + asl_flag + dwlltype + refurb_new + 
    hnd_price + models, family = "binomial", data = train[, -1])

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3444  -0.7723  -0.6684  -0.4529   2.6948  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -5.420e-01  9.426e-02  -5.750 8.91e-09 ***
actvsubs       3.840e-02  1.769e-02   2.170 0.029970 *  
adjrev        -7.662e-05  2.339e-05  -3.276 0.001053 ** 
avgmou         7.974e-04  6.461e-05  12.341  < 2e-16 ***
age1          -6.242e-03  5.647e-04 -11.053  < 2e-16 ***
change_mou    -1.625e-04  4.536e-05  -3.583 0.000339 ***
children       7.449e-02  2.816e-02   2.645 0.008171 ** 
comp_vce_Mean -8.242e-04  1.943e-04  -4.242 2.21e-05 ***
custcare_Mean -7.873e-03  3.122e-03  -2.522 0.011669 *  
drop_vce_Mean  9.510e-03  1.618e-03   5.878 4.14e-09 ***
eqpdays        1.068e-03  7.618e-05  14.017  < 2e-16 ***
months        -1.189e-02  2.308e-03  -5.153 2.57e-07 ***
mou_Mean      -7.885e-04  6.308e-05 -12.502  < 2e-16 ***
ovrrev_Mean    5.766e-03  6.177e-04   9.335  < 2e-16 ***
retdays       -1.884e-03  3.047e-04  -6.184 6.24e-10 ***
rev_Range      3.309e-04  2.382e-04   1.389 0.164751    
roam_Mean      3.713e-03  1.160e-03   3.201 0.001370 ** 
wrkwoman      -5.017e-02  3.523e-02  -1.424 0.154425    
asl_flag      -4.570e-01  3.801e-02 -12.024  < 2e-16 ***
dwlltype       7.346e-02  2.841e-02   2.586 0.009715 ** 
refurb_new    -1.958e-01  3.632e-02  -5.390 7.03e-08 ***
hnd_price     -1.519e-03  2.359e-04  -6.436 1.23e-10 ***
models         1.059e-01  2.036e-02   5.201 1.98e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 50785  on 46279  degrees of freedom
Residual deviance: 49410  on 46257  degrees of freedom
AIC: 49456

如果我添加变量change_per_callchange_per_min,我会收到错误消息:

model6 <- glm(formula =churn~ actvsubs+ adjrev+ 
              #adjmou+ 
              avgmou+ 
              #avgrev+
              #avgqty+
              age1+
              #age2+
              #blck_dat_Mean+
              #callwait_Mean+ 
              #callwait_Range+ 
              change_mou+ children+comp_vce_Mean+custcare_Mean+
              #datovr_Mean+
              #da_Mean+ 
              #drop_blk_Mean+
              #drop_dat_Mean+
              drop_vce_Mean+eqpdays+
              #forgntvl+
              #income+
              months+mou_Mean+
              #ovrmou_Mean+
              ovrrev_Mean+
              retdays+rev_Range+roam_Mean+
              #totcalls+
              #totrev+
              wrkwoman+asl_flag+dwlltype+refurb_new+
              #mtrcycle+truck+
              hnd_price+models
            #+numbcars
            +change_per_call+ change_per_min,data = train[,-1], 
            family = "binomial")
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
  NA/NaN/Inf in 'x'

注意:变量change_per_call change_per_min都是数字的;它们不包含任何零值。

如何克服此类错误?

删除以下两个变量的无限值后

change_per_callchange_per_min 我收到了以下摘要结果

 summary(tele6$change_per_call)
        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
      0.0353   0.2914   0.4377   0.8324   0.7084 366.9400 
    > 
    > summary(tele6$change_per_min)
        Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
      0.0100   0.0974   0.1409   0.3300   0.2276 622.3750 
    > 

@Ben Bolker,现在我没有“0”,没有“无限”但仍然在glm.fit中显示错误

1 个答案:

答案 0 :(得分:0)

由于您的大多数数据都是数字,因此您可以尝试缩放()数据,也可以使用MASS包中的lda()。 LDA和Logistic主要但并不总是给出相同的结果。

我想你已经试着看看这两个变量的范围()是多少,以检查它们是否有异常大或小的值。