Question

这可能更像是一个错误报告而不是一个问题，但是：为什么明确使用newdata参数来预测使用与训练数据相同的数据集有时会产生不同的预测而不是省略newdata参数和明确使用训练数据集？

library(lme4)
packageVersion("lme4") # 1.1.8
m1 <- glmer(myformula, data=X, family="binomial")
p1 <- predict(m1, type="response")
p2 <- predict(m1, type="response", newdata=X)
all(p1==p2) # FALSE

这不仅仅是一个舍入错误。我看到cor(p1,p2)返回0.8。

这似乎与斜坡模型隔离开来。在下图中，隐式表示predict(..., type="response")没有newdata，显式表示predict(..., type="response", newdata=X)，其中X与训练相同。模型1和其他模型之间的唯一区别是模型1仅包含（随机）截距，其他模型具有随机截距和随机斜率。

enter image description here

Answer 1

事实证明，这是predict.merMod中的一个错误，已在开发版本中修复（2014年11月，this Github issue）。如果安装了编译工具，可以直接从Github安装开发版本

devtools::install_github("lme4/lme4")

训练集上的predict.glmer在有和没有newdata的情况下有所不同

1 个答案: