Question

考虑代码：

x <- read.table("http://data.princeton.edu/wws509/datasets/cuse.dat",
                header=TRUE)[,1:2]

fit <- glm(education ~ age, family="binomial", data=x)

summary(fit)

年龄有4个级别：“＆lt; 25”“25-29”“30-39”“40-49”

结果是：

enter image description here

因此，默认情况下，其中一个级别用作参考级别。有没有办法让所有4个级别的glm输出系数+截距（即没有参考级别）？像SAS这样的软件包默认执行此操作，所以我想知道是否有任何选项。

谢谢！

Answer 1

请参阅?formula，具体而言，在模型规范中包含+ 0的含义......

# Sample data - explanatory variable (continuous)
x <- runif( 100 )
# explanatory data, factor with 3 levels
f <- as.factor( sample( 3 , 100 , TRUE ) )
# outcome data
y <- runif( 100 ) + rnorm(100) + rnorm( 100 , mean = c(1,3,6) )

# model without intercept
summary( glm( y ~ x + f + 0 ) )
#Call:
#glm(formula = y ~ x + f + 0)

#Deviance Residuals: 
#    Min       1Q   Median       3Q      Max  
#-5.7316  -1.8923   0.0195   1.8918   5.9520  

#Coefficients:
#   Estimate Std. Error t value Pr(>|t|)    
#x    0.3216     0.9772   0.329    0.743    
#f1   3.4493     0.6823   5.055 2.06e-06 ***
#f2   3.6349     0.6959   5.223 1.02e-06 ***
#f3   3.1962     0.6598   4.844 4.87e-06 ***

Answer 2

您需要使用model.matrix函数将年龄变量中的因子转换为二进制变量。

请参阅this回答。

编辑：这是一个例子：

x <- read.table("http://data.princeton.edu/wws509/datasets/cuse.dat",
                header=TRUE)[,1:2]
binary_variables <- model.matrix(~ x$age -1, x)
fit <- glm(x$education ~ binary_variables, family="binomial")
summary(fit)

有没有办法安装`glm（）`以便包含所有级别（即没有参考级别）？

2 个答案: