R:置信区间和预测

时间:2018-07-19 14:16:12

标签: r predict

我对置信区间和预测有疑问。

我有一个数据集(称为“数据”),它由2个不同变量S和N的158个观测值组成,尽管对于某些观测值N不可用。我已经能够使用qplot绘制回归线和95%置信区间。 So far so good. 现在,我有了第二个完全不同的数据集(称为“ data2”),其中包含127个N的观测值,并且想知道这对应于哪个S,以及这些S值的置信区间是多少。 我似乎无法预测这些值。也许有人可以在这里帮助我?

这是我尝试的:

path=*@*.*

这会给我警告消息

data.lm = lm(data$S~data$N)
newdata = data.frame(data2$N)
predict(data.lm, newdata, interval=c("confidence"))

,它给出158行拟合值,上限值和下限值,但是它们显然不属于我的data2 N值。

Warning message:
 'data2' had 127 rows but variables found have 158 rows 

当我尝试诸如

之类的特定值时出现相同的问题
  fit      lwr      upr
1   37.88919 37.66022 38.11816
2   38.38123 38.23795 38.52451
3         NA       NA       NA
4   37.59720 37.26820 37.92621
5   38.09655 37.92488 38.26823
6   37.77301 37.50590 38.04012
...

它给了我警告和完全相同的输出。

我可能在这里很愚蠢,但是我发现了很多类似的问题,而解决方案似乎总是我尝试过的方法。为什么预测不会给我一行适合度,upr和lwr的值,而是对lm所基于的数据做些什么?

非常感谢您

编辑:

我使用的数据:

data.lm = lm(data$S~data$N)
newdata = data.frame(N=5)
predict(data.lm, newdata, interval=c("confidence"))

以及我要预测S值的新数据集:

structure(list(S = c(36.7735, 36.7735, 36.7735, 36.7735, 36.7735, 
36.7735, 36.7735, 36.7735, 36.7735, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 37.307, 
37.307, 37.307, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 
38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.35525, 38.766, 
38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 38.766, 
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 
39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 39.639, 
39.639), N = c(7.740086957, 9.716043478, NA, 6.567521739, 8.572826087, 
7.273521739, 8.689478261, NA, 8.112565217, 9.370289089, 8.429912766, 
9.178733143, 8.136725442, 9.127494831, 7.91849608, 8.775866462, 
8.733992185, 8.47272603, 8.700879331, 9.57630994, 9.184129237, 
9.501760687, 10.04023077, 9.887214462, 7.947499285, 8.681177515, 
10.14076961, 8.990465816, 10.35920222, 8.793812067, 8.962143225, 
NA, 10.89773618, 9.646558574, NA, 8.708896587, 8.482467842, 9.490473018, 
9.724324492, 9.185016805, 9.367232547, 9.447726264, 10.49359078, 
9.086775124, 8.951230645, 8.438922723, 7.612619197, 8.961837755, 
NA, 8.473436422, 9.487274967, 8.839257463, 8.019280063, 8.829296324, 
9.089621228, 12.66471665, NA, 7.93418751, 8.442549778, 12.43150655, 
12.78812747, 9.499177641, 8.88329767, 12.06733547, 8.694287059, 
8.733657869, 8.976294071, 11.61797642, NA, 9.223855496, 12.14555242, 
9.177782834, 10.50860256, 8.830982089, 9.338875366, 11.10966871, 
9.009297476, 9.114841643, 9.145197506, 7.508668256, 8.49838577, 
11.70012856, 8.859038138, 9.984367135, 11.18147471, 8.504456058, 
9.30440283, 8.491741245, 9.154016228, 7.969788358, 8.890420803, 
9.391405036, 8.023003384, 12.06142165, 10.0134321, 7.829115845, 
8.619827639, 7.965320738, 9.718533292, 9.642541995, 9.221551363, 
9.638749044, 8.728496275, 7.882667305, 8.059467865, 10.88596514, 
11.52200146, 8.465388516, 10.89040717, 8.652714649, 8.570009902, 
9.575021118, 10.20114206, 8.030898045, 9.325947744, 9.383493864, 
NA, 10.98718012, 13.58808295, 9.987675873, 11.59305101, 8.559274188, 
10.87432015, 9.530456451, NA, 13.39915598, 14.50068995, 11.4377845, 
9.874845508, 8.419345084, 9.833591752, 8.734194935, NA, 8.751516192, 
10.74365351, 10.94957982, 11.43931675, 9.26461008, 10.88196331, 
10.01986719, 8.521178027, 8.346310841, 9.116175981, 12.55888826, 
11.55922318, 11.62731629, 9.974676715, 8.659476016, 9.714302784, 
11.69627731, 9.404085345, 8.417580572, 10.26841052, 8.0505316, 
14.56194307, 8.496000239, 8.36501204, 9.105109509)), .Names = c("S", 
"N"), class = "data.frame", row.names = c(NA, -158L))

1 个答案:

答案 0 :(得分:0)

这是由于您指定模型的方式。您是在公式中指定原始data.frame,因此最终它将始终寻找该数据,而不是newdata中的正确变量。

mdl1 <- lm(mtcars$hp~mtcars$disp)
predict(mdl1,data.frame(disp=1:3))
        1         2         3         4         5         6         7         8 
115.74296 115.74296  92.99022 158.62312 203.25349 144.18388 203.25349 109.92351 
        9        10        11        12        13        14        15        16 
107.34195 119.06836 119.06836 166.41155 166.41155 166.41155 252.25938 247.00875 
       17        18        19        20        21        22        23        24 
238.25770  80.16993  78.85727  76.84453  98.28461 184.87627 178.75054 198.87796 
       25        26        27        28        29        30        31        32 
220.75559  80.30119  98.37212  87.34579 199.31551 109.17967 177.43788  98.67840 
Warning message:
'newdata' had 3 rows but variables found have 32 rows 

您应该做的是使用公式仅指定变量名称,然后通过lm参数将原始数据源提供给data

mdl2 <- lm(hp~disp,mtcars)
predict(mdl2,data.frame(disp=1:3))
       1        2        3 
46.17208 46.60964 47.04719