进行Tobit回归时出现奇点错误

时间:2017-12-27 08:42:14

标签: r regression na

我试图估计一个标准的托盘模型,该模型在零处被审查。

变量

附属变量:幸福

自变量

  • 城市(芝加哥,纽约),
  • 性别(男性,女性),
  • 就业(0 =失业,1 =就业),
  • 工作类型(失业,蓝色,白色),
  • 假期(失业,每周1天,每周2天)

' Worktype'和'假日'变量与“就业”相互作用。变量

我使用censReg包进行了回溯。

censReg(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)

但是summary()会返回以下错误。

Error in printCoefmat(coef(x, logSigma = logSigma), digits = digits) : 
  'x' must be coefficient matrix/data frame

为了找出原因,我进行了OLS回归。

有一些NA值,我认为是因为模型设计和变量设置(某些变量似乎存在奇点。'Employment' = 0的人的值为'Worktype' = Unemployed,{{1这可能是原因?)

'Holidays' = Unemployed

如何忽略NA值并运行tobit回归而不出错?

下面是可重现的代码。

lm(Happiness ~ City + Gender + Employment:Worktype + Employment:Holiday)


Coefficients: (2 not defined because of singularities)
                               Estimate Std. Error t value Pr(>|t|)  
(Intercept)                      41.750      9.697   4.305   0.0499 *
CityNew York                    -44.500     11.197  -3.974   0.0579 .
Gender1                           2.750     14.812   0.186   0.8698  
Employment:WorktypeUnemployed        NA         NA      NA       NA  
Employment:WorktypeBluecolor     35.000     17.704   1.977   0.1867  
Employment:WorktypeWhitecolor   102.750     14.812   6.937   0.0202 *
Employment:Holiday1 day a week  -70.000     22.394  -3.126   0.0889 .
Employment:Holiday2 day a week       NA         NA      NA       NA 

1 个答案:

答案 0 :(得分:3)

如果您逐步调试censReg的调用,则会达到以下maxLik优化:

result <- maxLik(censRegLogLikCross, start = start, 
      yVec = yVec, xMat = xMat, left = left, right = right, 
      obsBelow = obsBelow, obsBetween = obsBetween, obsAbove = obsAbove, 
      ...)

使用OLS回归确定的初始条件向量start包含NA的两个系数,如您所知:

  • 就业:失业的工作类型
  • 就业:假日:每周2天

这会导致maxLik返回NULL,并显示错误消息:

Return code 100: Initial value out of range.

summary函数会获得此NULL,它说明您收到的最终错误消息。

要覆盖此设置,可以设置start参数:

tobitreg <- censReg(formula = Happiness ~ City + Gender + Employment:Worktype +      
                      Employment:Holiday, start = rep(0,9) )
summary(tobitreg)

Call:
censReg(formula = Happiness ~ City + Gender + Employment:Worktype + 
    Employment:Holiday, start = rep(0, 9))

Observations:
         Total  Left-censored     Uncensored Right-censored 
             8              2              6              0 

Coefficients:
                               Estimate Std. error t value Pr(> t)
(Intercept)                      38.666        Inf       0       1
CityNew York                    -50.669        Inf       0       1
Gender1                        -360.633        Inf       0       1
Employment:WorktypeUnemployed     0.000        Inf       0       1
Employment:WorktypeBluecolor    345.674        Inf       0       1
Employment:WorktypeWhitecolor    56.210        Inf       0       1
Employment:Holiday1 day a week  346.091        Inf       0       1
Employment:Holiday2 day a week   55.793        Inf       0       1
logSigma                          1.794        Inf       0       1

Newton-Raphson maximisation, 141 iterations
Return code 1: gradient close to zero
Log-likelihood: -19.35431 on 9 Df

即使错误消息消失了,结果也不可靠:

  • 错误= Inf
  • 接近0的梯度:没有最佳值,解决方案是超平面

回归中的NA系数表示该系数与其他系数线性相关,因此您需要删除其中一些以获得唯一的解决方案。

您怀疑,原因是您在Employement = 0时只有worktype = Unemployed,因此模型无法估计Employment:WorktypeUnemployed的系数。 Employment:Holiday系数也有同样的问题。

因此,我担心您正在评估的回归模型没有最佳解决方案。

如果摆脱了链接的变量,这将起作用:

tobitreg <- censReg(formula = Happiness ~ City + Gender + Employment )
summary(tobitreg)
Call:
censReg(formula = Happiness ~ City + Gender + Employment)

Observations:
         Total  Left-censored     Uncensored Right-censored 
             8              2              6              0 

Coefficients:
             Estimate Std. error t value  Pr(> t)    
(Intercept)   38.6141     5.7188   6.752 1.46e-11 ***
CityNew York -50.1813     6.4885  -7.734 1.04e-14 ***
Gender1      -70.3859     8.2943  -8.486  < 2e-16 ***
Employment   111.5672    10.0927  11.054  < 2e-16 ***
logSigma       1.7930     0.2837   6.320 2.61e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Newton-Raphson maximisation, 8 iterations
Return code 1: gradient close to zero
Log-likelihood: -19.36113 on 5 Df