当存在2个或更多个级别时,“对比度仅适用于具有2个或更多级别的因子”错误(R)

时间:2017-04-12 05:53:14

标签: r logistic-regression

我有来自https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data的数据集 有两个因素有2个或更多级别,加上一个目标值SalePrice。

  Street      Alley        SalePrice     
 Grvl:   6   Grvl:  50   Min.   : 34900  
 Pave:1454   Pave:  41   1st Qu.:129975  
             NA's:1369   Median :163000  
                         Mean   :180921  
                         3rd Qu.:214000  
                         Max.   :755000 

当分别对两个因素运行线性回归时,运行正常。

> summary(lm(SalePrice ~ Street, data=train))

Call:
lm(formula = SalePrice ~ Street, data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-146231  -51131  -18131   32869  573869 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   130190      32416   4.016 6.21e-05 ***
StreetPave     50940      32483   1.568    0.117    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 79400 on 1458 degrees of freedom
Multiple R-squared:  0.001684,  Adjusted R-squared:  0.0009992 
F-statistic: 2.459 on 1 and 1458 DF,  p-value: 0.117

> summary(lm(SalePrice ~ Alley, data=train))

Call:
lm(formula = SalePrice ~ Alley, data = train)

Residuals:
    Min      1Q  Median      3Q     Max 
-128001  -17001    1781   16999  133781 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   122219       5153  23.718  < 2e-16 ***
AlleyPave      45782       7677   5.963  4.9e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 36440 on 89 degrees of freedom
  (1369 observations deleted due to missingness)
Multiple R-squared:  0.2855,    Adjusted R-squared:  0.2775 
F-statistic: 35.56 on 1 and 89 DF,  p-value: 4.9e-08

但是,当一起运行时,会导致错误,这是没有意义的。

> summary(lm(SalePrice ~ Street+Alley, data=train))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

有人可以为此提供帮助吗?

1 个答案:

答案 0 :(得分:0)

我在问题中从这一行得到了一个提示:   (由于缺失而删除了1369个观察结果)

在lm中,只删除缺失的值。在Street and Alley上运行lm时,NA因Alley而被删除,导致Street因子的单值。

> train[!is.na(Alley), Street]
 [1] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[16] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[31] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[46] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[61] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[76] Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave Pave
[91] Pave
Levels: Grvl Pave