当"对比只能应用于具有2个或更多级别的因素时,如何进行GLM"?

时间:2018-05-11 17:20:54

标签: r regression glm

我想使用glm在R中进行回归,但是有一种方法可以做到这一点,因为我得到了对比度错误。

mydf <- data.frame(Group=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12),
                   WL=rep(c(1,0),12), 
                   New.Runner=c("N","N","N","N","N","N","Y","N","N","N","N","N","N","Y","N","N","N","Y","N","N","N","N","N","Y"), 
                   Last.Run=c(1,5,2,6,5,4,NA,3,7,2,4,9,8,NA,3,5,1,NA,6,10,7,9,2,NA))

mod <- glm(formula = WL~New.Runner+Last.Run, family = binomial, data = mydf)
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels

1 个答案:

答案 0 :(得分:2)

使用此处定义的debug_contr_errordebug_contr_error2函数:How to debug “contrasts can be applied only to factors with 2 or more levels” error?,我们可以轻松地找到问题所在:变量New.Runner中只剩下一个级别。

info <- debug_contr_error2(WL ~ New.Runner + Last.Run, mydf)

info[c(2, 3)]
#$nlevels
#New.Runner 
#         1 
#
#$levels
#$levels$New.Runner
#[1] "N"

## the data frame that is actually used by `glm`
dat <- info$mf

不能将单个级别的因数应用于对比度,因为任何类型的对比都会使级别数减少1。通过1 - 1 = 0,该变量将从模型矩阵中删除。

那么,我们可以简单地要求不对单个级别的因素应用任何对比吗?否。所有对比方法都禁止这样做:

contr.helmert(n = 1, contrasts = FALSE)
#Error in contr.helmert(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.poly(n = 1, contrasts = FALSE)
#Error in contr.poly(n = 1, contrasts = FALSE) : 
#  contrasts not defined for 0 degrees of freedom

contr.sum(n = 1, contrasts = FALSE)
#Error in contr.sum(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.treatment(n = 1, contrasts = FALSE)
#Error in contr.treatment(n = 1, contrasts = FALSE) : 
#  not enough degrees of freedom to define contrasts

contr.SAS(n = 1, contrasts = FALSE)
#Error in contr.treatment(n, base = if (is.numeric(n) && length(n) == 1L) n else length(n),  : 
#  not enough degrees of freedom to define contrasts

实际上,如果仔细考虑,您会得出结论,没有对比,具有单个水平的因子只是所有1的虚拟变量,即截距。因此,您绝对可以执行以下操作:

dat$New.Runner <- 1    ## set it to 1, as if no contrasts is applied

mod <- glm(formula = WL ~ New.Runner + Last.Run, family = binomial, data = dat)
#(Intercept)   New.Runner     Last.Run  
#     1.4582           NA      -0.2507

由于rank-deficiency,您得到NA的{​​{1}}系数。实际上,applying contrasts is a fundamental way to avoid rank-deficiency。只是当一个因素只有一个层次时,对比的应用就变成了一个悖论。

我们也来看看模型矩阵:

New.Runner

model.matrix(mod) # (Intercept) New.Runner Last.Run #1 1 1 1 #2 1 1 5 #3 1 1 2 #4 1 1 6 #5 1 1 5 #6 1 1 4 #8 1 1 3 #9 1 1 7 #10 1 1 2 #11 1 1 4 #12 1 1 9 #13 1 1 8 #15 1 1 3 #16 1 1 5 #17 1 1 1 #19 1 1 6 #20 1 1 10 #21 1 1 7 #22 1 1 9 #23 1 1 2 (intercept)具有相同的列,并且只能估计其中之一。如果您想估算New.Runner,请删除截距:

New.Runner

请确保您彻底消化了排名不足的问题。如果您有一个以上的单层因子,并且将它们全部替换为1,那么丢弃一个截距仍然会导致排名不足。

glm(formula = WL ~ 0 + New.Runner + Last.Run, family = binomial, data = dat)
#New.Runner    Last.Run  
#    1.4582     -0.2507