返回NA

时间:2017-11-20 06:37:23

标签: r linearmodels

我是r的新手并不确定如何解决我得到的错误 以下是我的数据摘要:

> summary(data)
        Metro                          MrktRgn     MedAge     numHmSales   
     Abilene  : 1   Austin-Waco-Hill Country  : 6   20-25: 3   Min.   :  302  
     Amarillo : 1   Far West Texas            : 1   25-30: 6   1st Qu.: 1057  
     Arlington: 1   Gulf Coast - Brazos Bottom:10   30-35:28   Median : 2098  
     Austin   : 1   Northeast Texas           :14   35-40: 6   Mean   : 7278  
     Bay Area : 1   Panhandle and South Plains: 5   45-50: 2   3rd Qu.: 5086  
     Beaumont : 1   South Texas               : 7   50-55: 1   Max.   :83174  
     (Other)  :40   West Texas                : 3                             
        AvgSlPr          totNumLs         MedHHInc          Pop         
     Min.   :123833   Min.   :  1257   Min.   :37300   Min.   :   2899  
     1st Qu.:149117   1st Qu.:  6028   1st Qu.:53100   1st Qu.:  56876  
     Median :171667   Median : 11106   Median :57000   Median : 126482  
     Mean   :188637   Mean   : 24302   Mean   :60478   Mean   : 296529  
     3rd Qu.:215175   3rd Qu.: 25472   3rd Qu.:66200   3rd Qu.: 299321  
     Max.   :303475   Max.   :224230   Max.   :99205   Max.   :2196000  
     NA's   :1 

然后我用AvSlPr作为y变量创建一个模型,其他变量作为x变量

> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)

但是当我对模型进行总结时,我得到了Std的NA。错误,t值和t p值。

> summary(model1)

Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + 
    totNumLs + MedHHInc + Pop)

Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!

Coefficients: (15 not defined because of singularities)
                                  Estimate Std. Error t value Pr(>|t|)
(Intercept)                         143175         NA      NA       NA
MetroAmarillo                        24925         NA      NA       NA
MetroArlington                       35258         NA      NA       NA
MetroAustin                         160300         NA      NA       NA
MetroBay Area                        68642         NA      NA       NA
MetroBeaumont                         5942         NA      NA       NA
...
MrktRgnWest Texas                       NA         NA      NA       NA
MedAge25-30                             NA         NA      NA       NA
MedAge30-35                             NA         NA      NA       NA
MedAge35-40                             NA         NA      NA       NA
MedAge45-50                             NA         NA      NA       NA
MedAge50-55                             NA         NA      NA       NA
numHmSales                              NA         NA      NA       NA
totNumLs                                NA         NA      NA       NA
MedHHInc                                NA         NA      NA       NA
Pop                                     NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:      1,     Adjusted R-squared:    NaN 
F-statistic:   NaN on 44 and 0 DF,  p-value: NA

有谁知道什么是错的,我怎么能解决这个问题?另外,我不应该使用虚拟变量。

1 个答案:

答案 0 :(得分:1)

您的Metro变量始终引用每个因子级别的单行。你需要至少两个点才能适合一条线。让我用一个例子来证明:

dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!

#Coefficients: (1 not defined because of singularities)
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.33801         NA      NA       NA
#MetroB       0.47350         NA      NA       NA
#MetroC      -0.04118         NA      NA       NA
#MetroD       0.20047         NA      NA       NA
#MrktRgn           NA         NA      NA       NA

#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared:      1,    Adjusted R-squared:    NaN 
#F-statistic:   NaN on 3 and 0 DF,  p-value: NA

但是,如果我们添加更多数据以便至少某些因子级别具有多行数据,则可以计算线性模型:

dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)

#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)

#Residuals:
#         1          2          3          4          5          6          7 
# 9.021e-17  2.643e-01  7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03  1.498e-01 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)   
#(Intercept)  0.24279    0.30406   0.798  0.50834   
#MetroB      -0.10207    0.38858  -0.263  0.81739   
#MetroC      -0.06696    0.39471  -0.170  0.88090   
#MetroD       0.06804    0.41243   0.165  0.88413   
#MrktRgn      0.70787    0.06747  10.491  0.00896 **
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared:  0.9857,   Adjusted R-squared:  0.9571 
#F-statistic: 34.45 on 4 and 2 DF,  p-value: 0.02841

需要重新考虑用于拟合模型的数据。分析的目标是什么?实现目标需要哪些数据?