Question

回归中常见的一种方法是尝试预测样本平均值的结果。

当存在二元自变量时，这很复杂;典型的方法是将样本百分比插入相关系数，如下所示：

set.seed(102349)
x = runif(1e3) > .73
y = 6 - 3.4 * x + rnorm(1e3)
reg <- lm(y ~ x)

通过构造，x的样本平均值约为0.27，因此要获得与“＃34;典型”相关的预测y。观察，我们在.27上插入系数xTRUE：

sum(reg$coefficients * c(1, mean(x)))
# [1] 5.032203

然而，在更复杂的模型中，这是麻烦和钝的。有没有办法让predict为这样的模型工作？

这是一个错误，因为lm会跟踪它所提供的类：

predict(reg, data.frame(x = mean(x)))

我能够欺骗predict这样工作，但似乎不宜像这样覆盖原始模型对象：

#surprise, predict! it's not logical after all!
attr(reg$terms, "dataClasses")["x"] <- "numeric"

#and because it's not logical, there are no contrasts either!
reg$contrasts <- NULL

#try and complain now, chump!
predict(reg, data.frame(x = mean(x)))
#        1 
# 5.032203

使用样本预测意味着二元变量

0 个答案: