ggplot2图中的置信区间和使用R中的预测函数获得的值不相同

时间:2015-02-04 18:15:51

标签: r ggplot2 predict confidence-interval

这是我的数据(示例):

EXAMPLE<-data.frame(
    X=c(99.6, 98.02, 96.43, 94.44, 92.06, 90.08, 87.3, 84.92, 82.14, 
79.76, 76.98, 74.21, 71.03, 67.86, 65.08, 62.3, 59.92, 56.35, 
52.38, 45.63, 41.67, 35.71, 30.95, 24.6, 17.86, 98.44, 96.48, 
94.14, 92.19, 89.84, 87.5, 84.38, 82.42, 78.52, 76.17, 73.83, 
70.7, 65.63, 62.89, 60.16, 58.2, 54.69, 52.73, 49.61, 46.09, 
42.58, 40.23, 36.72, 32.81),
    Y=c(3.62, 9.78, 15.22, 19.93, 24.64, 30.43, 35.14, 39.49, 44.93, 
49.64, 52.9, 57.97, 62.68, 66.3, 70.29, 73.55, 76.09, 78.62, 
80.8, 82.61, 84.42, 87.32, 91.67, 96.01, 99.28, 3.85, 8.55, 11.97, 
17.52, 20.94, 25.21, 29.49, 34.62, 38.89, 41.88, 46.58, 50.43, 
57.26, 63.25, 67.09, 70.09, 74.79, 79.06, 82.91, 88.03, 91.88, 
95.3, 97.86, 99.57))

我做了多项式回归:

> LinearModel.2 <- lm(Y ~ X +I(X ^2), data=EXAMPLE)

> summary(LinearModel.2)

Call:
lm(formula = Y ~ X + I(X^2), data = CET2M3)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.3278 -4.0767  0.2222  4.7403  6.3660 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 97.041626   5.491862  17.670  < 2e-16 ***
X            0.339600   0.183034   1.855     0.07 .  
I(X^2)      -0.012709   0.001416  -8.975 1.13e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.706 on 46 degrees of freedom
"Multiple R-squared:  0.9755,   Adjusted R-squared:  0.9745 "
F-statistic:   917 on 2 and 46 DF,  p-value: < 2.2e-16

并且置信区间为95%:

> Confint(LinearModel.2, level=0.95)
               Estimate       2.5 %        97.5 %
(Intercept) 97.04162631 85.98708171 108.096170906
X            0.33959960 -0.02882900   0.708028199
I(X^2)      -0.01270946 -0.01555982  -0.009859103

否则,当我使用ggplot2函数绘制回归时,我得到下一个图像:

qplot(X, Y, data=EXAMPLE, geom=c("point", "smooth"), method="lm", formula= y ~ poly(x, 2))

enter image description here

最后,当我根据下一个命令预测多项式回归后的Y值及其置信区间时:

newdata50 = data.frame(X=50) 
predict(LinearModel.2,newdata50,interval="predict")

我得到了下一个值

Fit=82,24796    2.5=72,57762    97.5=91,91829

尽管Fit值与ggplot2图表中的预期值完全匹配,但置信区间不会。

出了什么问题?我应该相信谁?为什么他们不一样?

1 个答案:

答案 0 :(得分:5)

预测间隔和置信区间之间存在差异。观察

predict(LinearModel.2,newdata50,interval="predict")
#        fit      lwr      upr
# 1 82.24791 72.58054 91.91528
predict(LinearModel.2,newdata50,interval="confidence")
#        fit      lwr      upr
# 1 82.24791 80.30089 84.19494

ggplot绘制置信区间,而不是预测区间。