使用bayesglm时,我在预测功能方面遇到了一些问题。我读过一些帖子,说当样本数据的数量超过样本数据时,可能会出现这个问题,但是我使用相同的数据来拟合和预测函数。预测与常规glm一起工作正常,但不适用于bayesglm。例如:
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error:
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# Edit... I just discovered this works.
# Should I be concerned about using these results?
# Not sure why is fails when I specify the dataset
predicted3 <- predict(bayesglmObject, type = "response")
无法弄清楚如何使用bayesglm对象进行预测。有任何想法吗?谢谢!
答案 0 :(得分:2)
其中一个原因可能是使用bayesglm命令中参数“drop.unused.levels”的默认设置。默认情况下,此参数设置为TRUE。因此,如果存在未使用的级别,则在建模期间会丢失。但是,预测函数仍然使用原始数据,因子变量中存在未使用的级别。这会导致用于模型构建的数据与用于预测的数据之间的级别差异(即使它是相同的数据成名 - 在您的情况下,myData)。我在下面给出了一个例子:
n <- 100
x1 <- rnorm (n)
x2 <- as.factor(sample(c(1,2,3),n,replace = TRUE))
# Replacing 3 with 2 makes the level = 3 as unused
x2[x2==3] <- 2
y <- as.factor(sample(c(1,2),n,replace = TRUE))
myData <- data.frame(x1 = x1, x2 = x2, y = y)
control <- y ~ x1 + x2
# this works fine:
glmObject <- glm(control, myData, family = binomial())
predicted1 <- predict.glm(glmObject , myData, type = "response")
# this gives an error - this uses default drop.unused.levels = TRUE
bayesglmObject <- bayesglm(control, myData, family = binomial())
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
Error in X[, piv, drop = FALSE] : subscript out of bounds
# this works fine - value of drop.unused.levels is set to FALSE
bayesglmObject <- bayesglm(control, myData, family = binomial(),drop.unused.levels = FALSE)
predicted2 <- predict.bayesglm(bayesglmObject , myData, type = "response")
我认为更好的方法是使用droplevels预先从数据框中删除未使用的级别,并将其用于模型构建和预测。