我正在使用Logistic回归设置建立死亡率预测模型。 为了选择变量,我通过MASS软件包的stepAIC函数对模型进行逐步回归。我使用该功能已经有一段时间了,没有太多麻烦。对于我当前的数据集,当我选择某些变量但为其他变量平稳执行时,看到此函数会引发错误“ x [good,,drop = FALSE]:(下标)逻辑下标过长”。
我将模型缩小为4个变量,但仍然可以看到结果出现。 检查可变长度,最大可变长度约为14。 我还检查了这些选定变量的变量分布,对我来说看起来不错。 经检查的VIF,都在2-2.5左右。 我什至删除了NA值。有趣的是,相同的代码对于具有其他变量集的相同数据集也可以顺利运行。任何帮助将不胜感激。
logitmodel = glm(death_count~ ., tempdata[,c("death_count", "pct50to74_acs", "pct75to99_acs", "pct100to124_acs")],
offset = tempdata$offset_baseqx, family = "binomial")
nothing = glm(death_count ~ 1, tempdata[,c("death_count", "pct50to74_acs", "pct75to99_acs", "pct100to124_acs")],
offset = tempdata$offset_baseqx, family = "binomial")
myscope = list(lower = formula(nothing), upper = formula(logitmodel))
ForwardModel = stepAIC(nothing, scope = myscope, direction = "forward", k = 2)
下面是变量摘要。
> summary(tempdata)
death_count offset_baseqx pct50to74_acs pct75to99_acs pct100to124_acs
Min. :0.00000 Min. :-12.68618 Min. :0.00000 Min. :0.00000 Min. :0.00000
1st Qu.:0.00000 1st Qu.: -6.37140 1st Qu.:0.01375 1st Qu.:0.01812 1st Qu.:0.02183
Median :0.00000 Median : -5.13844 Median :0.02640 Median :0.03383 Median :0.03783
Mean :0.01698 Mean : -5.18591 Mean :0.03229 Mean :0.03784 Mean :0.04040
3rd Qu.:0.00000 3rd Qu.: -3.95894 3rd Qu.:0.04486 3rd Qu.:0.05279 3rd Qu.:0.05511
Max. :1.00000 Max. : 0.03052 Max. :0.10469 Max. :0.10453 Max. :0.10119
这是我面临的错误
> ForwardModel = stepAIC(nothing, scope = myscope, direction = "forward", k = 2)
Start: AIC=224385.6
death_count ~ 1
Error in x[good, , drop = FALSE] : (subscript) logical subscript too long