当我将有序因子插入lm
函数时,结果是意外的(对我而言)。
当然有一个很好的解释......
# Generate some data
# parameters
n = 20L
set.seed(11L)
# Ordered factor
t <- factor(sample(c(1L, 2L), size = n, replace = TRUE),
label = c("Low", "High"),
ordered = TRUE)
t
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: Low < High
# not ordered factor, keep reference level as High
tno <- factor(t , ordered = FALSE)
tno <- relevel(tno, ref = "High")
tno
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: High Low
# A simple indicator variable
ti <- t == "Low"
ti
[1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[12] TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Some dependent variable
y <- 10*rnorm(n)
# Run three regression
# Observe ordered factor is not giving the correct results
lm(y ~ t)
Call:
lm(formula = y ~ t)
Coefficients:
(Intercept) t.L
-3.6082 0.8038
lm(y ~ tno)
Call:
lm(formula = y ~ tno)
Coefficients:
(Intercept) tnoLow
-3.040 -1.137
lm(y ~ ti)
Call:
lm(formula = y ~ ti)
Coefficients:
(Intercept) tiTRUE
-3.040 -1.137
# Confirm correct intercept
mean(y[t == "High"])
[1] -3.039771
# Just rounding difference...
答案 0 :(得分:4)
尝试运行此
rest<- lm(y ~ t)
restno <- lm(y ~ tno)
resti <- lm(y ~ ti)
rest$fitted.values
restno$fitted.values
resti$fitted.values
rest$xlevels
restno$xlevels
resti$xlevels
rest$contrasts
restno$contrasts
resti$contrasts
您将看到的是,首先,所有三个模型的拟合值完全相同。因此,有序的模型并非“错误”。
其次,您将看到级别不同。事实上,只有tno有水平。其他人没有,因为你将它们视为数字,你可以做,因为这是一个二分变量。你还会看到因子是一个字符串而另外两个不是。
第三,你会看到tno和ti使用“contr.treatment”,而t使用“contr.poly”,这对于一个序数变量是有意义的。
如果你运行
restno_poly<- lm(y ~ tno, contrasts = list(tno = "contr.poly"))
restno_poly
你会得到
Call:
lm(formula = y ~ tno, contrasts = list(tno = "contr.poly"))
Coefficients:
(Intercept) tno.L
-3.6082 -0.8038
同样
rest_treatment<- lm(y ~ t, contrasts = list(t = "contr.treatment"))
同样给出了你期待的结果。