以下示例数据可用:
X=as.data.frame(cbind(rep(c(1,23,456,7,8),5),c(2,34,89,4,52),c(1,4,32,4,5),c(81,30,32,41,100),c(1,-9,8,8,512),
c(5,356,854,33,522),c(12,31,34,345,565),c(11,84,889,42,2),c(18,349,8239,2,521),c(15,3,9,32,44),
c(67,55,9,4,2),c(18,114,56,7,77),c(89,23,56,41,52),c(21,234,5,4,2),c(133,776,88,42,54),
c(12,374,11,22,58),c(11,90,0,5,14),c(12,45,66,32,54),c(13,33,67,77,526),c(67,34,99,177,2)))
y=as.data.frame(c(0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0))
names(y)="y"
因此,我有20个独立变量和一个二进制因变量。我首先要估计所有一个变量模型,然后选择偏差最小的模型。然后,我要从其余变量中向该模型添加一个变量,并选择偏差最小的两个变量模型。我想对所有3,...,20个变量模型执行此操作。然后,我有21个(仅侦听和20个最低偏差k变量模型)。然后,我要选择BIC最小的那个。 这可以使用bestglm完成:
Xy=cbind(X,y)
bestglm(Xy, family = binomial, IC = "BIC",,
method = "forward", intercept = TRUE)
但是,bestglm不允许有超过15个协变量:
bestglm(Xy,family = binomial,IC =“ BIC”,,method =“ forward”,的错误: p =20。对于GLM,必须为<= 15。
如何使用15个以上的协变量进行逐步Logistic前向选择,如示例中所示?