我在glm()
函数中注意到因素的顺序可以改变结果,但是我不明白为什么:
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") #example
mydata$rank <- factor(mydata$rank)
# gpa was at the 2nd place
my.mod <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial")
anova(my.mod, test="Chisq")$"Pr(>Chi)"
[1] NA 1.907193e-04 1.684783e-02 7.088456e-05
# here, rank was at the 2nd place
my.mod <- glm(admit ~ gre + rank + gpa, data = mydata, family = "binomial")
anova(my.mod, test="Chisq")$"Pr(>Chi)"
[1] NA 1.907193e-04 8.191817e-05 1.419044e-02
通常将逻辑回归(glm)与方差分析/ chi2耦合,可以在独立权衡其他因素的情况下寻找对数据集影响最大的因素,不是吗?