R的lm显示有序因子的奇怪行为

时间:2017-04-14 22:33:21

标签: r lm

当我将有序因子插入lm函数时,结果是意外的(对我而言)。

当然有一个很好的解释......

# Generate some data
# parameters
n = 20L
set.seed(11L)

# Ordered factor
t <- factor(sample(c(1L, 2L), size = n, replace = TRUE),
       label = c("Low", "High"), 
       ordered = TRUE)
t
 [1] Low  Low  High Low  Low  High Low  Low  High Low  Low  Low  High
[14] High High High Low  Low  Low  Low 
Levels: Low < High

# not ordered factor, keep reference level as High
tno <- factor(t , ordered = FALSE)
tno <- relevel(tno, ref = "High")
tno
 [1] Low  Low  High Low  Low  High Low  Low  High Low  Low  Low  High
[14] High High High Low  Low  Low  Low 
Levels: High Low

# A simple indicator variable
ti <- t == "Low"
ti
 [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
[12]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

# Some dependent variable
y <- 10*rnorm(n)

# Run three regression
# Observe ordered factor is not giving the correct results
lm(y ~ t)
Call:
lm(formula = y ~ t)

Coefficients:
(Intercept)          t.L  
    -3.6082       0.8038

lm(y ~ tno)
Call:
lm(formula = y ~ tno)

Coefficients:
(Intercept)       tnoLow  
     -3.040       -1.137 

lm(y ~ ti)
Call:
lm(formula = y ~ ti)

Coefficients:
(Intercept)       tiTRUE  
     -3.040       -1.137  


# Confirm correct intercept
mean(y[t == "High"])
[1] -3.039771
# Just rounding difference...

1 个答案:

答案 0 :(得分:4)

尝试运行此

rest<- lm(y ~ t)
restno <- lm(y ~ tno)
resti <- lm(y ~ ti)

rest$fitted.values
restno$fitted.values
resti$fitted.values

rest$xlevels
restno$xlevels
resti$xlevels

rest$contrasts
restno$contrasts
resti$contrasts

您将看到的是,首先,所有三个模型的拟合值完全相同。因此,有序的模型并非“错误”。

其次,您将看到级别不同。事实上,只有tno有水平。其他人没有,因为你将它们视为数字,你可以做,因为这是一个二分变量。你还会看到因子是一个字符串而另外两个不是。

第三,你会看到tno和ti使用“contr.treatment”,而t使用“contr.poly”,这对于一个序数变量是有意义的。

如果你运行

restno_poly<- lm(y ~ tno, contrasts = list(tno = "contr.poly"))
restno_poly

你会得到

Call:
lm(formula = y ~ tno, contrasts = list(tno = "contr.poly"))

Coefficients:
(Intercept)        tno.L  
    -3.6082      -0.8038  

同样

rest_treatment<- lm(y ~ t, contrasts = list(t = "contr.treatment"))

同样给出了你期待的结果。

This page explains more.