Question

我无法模拟stat_smooth计算其置信区间的方式。

让我们生成一些数据和一个简单的模型：

library(tidyverse)    
# sample data
df = tibble(
  x = runif(10),
  y = x + rnorm(10)*0.2
)

# simple linear model
model = lm(y ~ x, df)

现在使用predict()生成值和置信区间

# predict 
df$predicted = predict(
  object = model,
  newdata = df
)

# predict 95% confidence interval
df$CI = predict(
  object = model,
  newdata = df,
  se.fit = TRUE
)$se.fit * qnorm(1 - (1-0.95)/2)

请注意，qnorm用于从标准错误扩展到95％CI

绘制数据（黑点），geom_smooth（黑线+灰色色带）和预测色带（红色和蓝色线）。

ggplot(df) +
  aes(x = x, y = y) +
  geom_point(size = 2) +
  geom_smooth(method = "lm", level = 0.95, fullrange = TRUE, color = "black") +
  geom_line(aes(y = predicted + CI), color = "blue") + # upper
  geom_line(aes(y = predicted - CI), color = "red") + # lower
  theme_classic()

红色和蓝色线应与色带的边缘相同。我做错了什么？

Answer 1

正如@Dason在评论中所发表的那样，答案是geom_smooth使用的是t分布，而不是正态分布。

在我原来的问题中，将qnorm(1 - (1-0.95)/2)替换为qt(1 - (1-0.95)/2, nrow(df))以匹配相关的行。

如何使用`level`在geom_smooth中生成置信区间？

1 个答案: