Question

我不确定这个问题在这里或Cross Validated是否更合适。我希望我做出了正确的选择。

考虑一下这个例子：

library(dplyr)
setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Species)
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2))

默认情况下，ggplot“显示平滑周围的置信区间”（参见here），由回归曲线周围的灰色区域给出。我一直认为这些是simultaneous confidence bands for the regression curve，而不是逐点置信带。 ggplot2文档引用了predict函数，以获取有关如何计算标准错误的详细信息。但是，在阅读predict.lm的文档时，并没有明确说明计算出同时的置信区间。那么，这里的正确解释是什么？

Answer 1

检查predict.lm()计算内容的一种方法是检查代码（predict将标准错误乘以qt((1 - level)/2, df)，因此似乎不会对同步推断进行调整）。另一种方法是构建同时置信区间并将它们与predict的区间进行比较。

拟合模型并构建同时置信区间：

setosa <- subset(iris, Species == "setosa")
setosa <- setosa[order(setosa$Sepal.Length), ]
fit <- lm(Sepal.Width ~ poly(Sepal.Length, 2), setosa)

K <- cbind(1, poly(setosa$Sepal.Length, 2))
cht <- multcomp::glht(fit, linfct = K)
cci <- confint(cht)

重塑并策划：

cc <- as.data.frame(cci$confint)
cc$Sepal.Length <- setosa$Sepal.Length
cc <- reshape2::melt(cc[, 2:4], id.var = "Sepal.Length")

library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  geom_smooth(method ="lm", formula = y ~ poly(x,2)) +
  geom_line(data = cc, 
            aes(x = Sepal.Length, y = value, group = variable),
            colour = "red")

predict(.., interval = "confidence")似乎不会产生同时置信区间：

ggplot2的geom_smooth（）是否显示逐点置信带或同时置信带？

1 个答案: