这是我的数据(示例):
EXAMPLE<-data.frame(
X=c(99.6, 98.02, 96.43, 94.44, 92.06, 90.08, 87.3, 84.92, 82.14,
79.76, 76.98, 74.21, 71.03, 67.86, 65.08, 62.3, 59.92, 56.35,
52.38, 45.63, 41.67, 35.71, 30.95, 24.6, 17.86, 98.44, 96.48,
94.14, 92.19, 89.84, 87.5, 84.38, 82.42, 78.52, 76.17, 73.83,
70.7, 65.63, 62.89, 60.16, 58.2, 54.69, 52.73, 49.61, 46.09,
42.58, 40.23, 36.72, 32.81),
Y=c(3.62, 9.78, 15.22, 19.93, 24.64, 30.43, 35.14, 39.49, 44.93,
49.64, 52.9, 57.97, 62.68, 66.3, 70.29, 73.55, 76.09, 78.62,
80.8, 82.61, 84.42, 87.32, 91.67, 96.01, 99.28, 3.85, 8.55, 11.97,
17.52, 20.94, 25.21, 29.49, 34.62, 38.89, 41.88, 46.58, 50.43,
57.26, 63.25, 67.09, 70.09, 74.79, 79.06, 82.91, 88.03, 91.88,
95.3, 97.86, 99.57))
我做了多项式回归:
> LinearModel.2 <- lm(Y ~ X +I(X ^2), data=EXAMPLE)
> summary(LinearModel.2)
Call:
lm(formula = Y ~ X + I(X^2), data = CET2M3)
Residuals:
Min 1Q Median 3Q Max
-7.3278 -4.0767 0.2222 4.7403 6.3660
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 97.041626 5.491862 17.670 < 2e-16 ***
X 0.339600 0.183034 1.855 0.07 .
I(X^2) -0.012709 0.001416 -8.975 1.13e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.706 on 46 degrees of freedom
"Multiple R-squared: 0.9755, Adjusted R-squared: 0.9745 "
F-statistic: 917 on 2 and 46 DF, p-value: < 2.2e-16
并且置信区间为95%:
> Confint(LinearModel.2, level=0.95)
Estimate 2.5 % 97.5 %
(Intercept) 97.04162631 85.98708171 108.096170906
X 0.33959960 -0.02882900 0.708028199
I(X^2) -0.01270946 -0.01555982 -0.009859103
否则,当我使用ggplot2函数绘制回归时,我得到下一个图像:
qplot(X, Y, data=EXAMPLE, geom=c("point", "smooth"), method="lm", formula= y ~ poly(x, 2))
最后,当我根据下一个命令预测多项式回归后的Y值及其置信区间时:
newdata50 = data.frame(X=50)
predict(LinearModel.2,newdata50,interval="predict")
我得到了下一个值
Fit=82,24796 2.5=72,57762 97.5=91,91829
尽管Fit值与ggplot2图表中的预期值完全匹配,但置信区间不会。
出了什么问题?我应该相信谁?为什么他们不一样?
答案 0 :(得分:5)
预测间隔和置信区间之间存在差异。观察
predict(LinearModel.2,newdata50,interval="predict")
# fit lwr upr
# 1 82.24791 72.58054 91.91528
predict(LinearModel.2,newdata50,interval="confidence")
# fit lwr upr
# 1 82.24791 80.30089 84.19494
ggplot绘制置信区间,而不是预测区间。