Question

这与dataset I posted about before有关。仍在进行回归。

我的数据似乎符合y = m / x + b形式的反函数。我想我已经找到了如何优化系数m和b的方法。但是，当我使用predict()计算预测间隔时，上下限与模型的拟合值相同。

我有一个预感，问题出在我使用lm()的方式上，但我并不乐观。我对使用lm()，predict()和optim()函数很陌生。

输入：


    x <- c(52.145, 64.7763333333333, 5.67666666666667, 9.20433333333333, 
        27.2216666666667, 24.582, 110.125333333333, 204.760666666667, 
        22.5476666666667, 36.8053333333333, 26.651, 21.918, 27.0943333333333, 
        40.3293333333333, 74.2676666666667, 47.4926666666667, 6.52833333333333, 
        8.256, 22.8563333333333, 78.8866666666667, 215.426666666667, 
        126.403, 153.092333333333, 145.178, 52.748, 52.625, 77.714, 90.718, 
        149.326666666667, 166.201666666667, 2.75966666666667, 4.48933333333333, 
        100.533666666667, 63.2303333333333, 177.810333333333, 262.929333333333, 
        1.89166666666667, 5.78433333333333, 8.079, 7.904, 219.011333333333, 
        285.053, 268.940666666667, 314.485333333333, 26.394, 32.5373333333333)

        y <- c(8.49946738825106, 7.29520245195293, 30.8997268609617, 15.5103125126096, 
        15.4272708801451, 5.20309902020628, 5.54009400197214, 4.78299334331501, 
        19.6108718604226, 15.0078723188159, 14.7205349306272, 22.8457479040348, 
        7.51298922530701, 30.523886336804, 16.6780115471446, 17.0293235066314, 
        21.4184087138986, 27.7684786021191, 16.2257718439185, 12.9437536112634, 
        7.1482850633121, 5.88468308537335, 14.903472797458, 6.78855737045925, 
        13.8022476754789, 9.27299793481615, 5.70560924863637, 5.09707444175834, 
        2.62057882780782, 4.67082818412207, 31.2134177720261, 62.7183002243229, 
        17.3998126739725, 7.27920875242628, 6.97326341930058, 3.644112408786, 
        83.0020817353163, 10.5840077295553, 36.8081152007687, 26.1648393978833, 
        2.72967129507292, 3.34330462563615, 7.24378060360362, 5.62587984381713, 
        18.2047389283449, 5.81239109000072)

    ############# This section of code no longer needed #####################
    ### dat <- data.frame(x, y)           # make a dataframe of x and y
    ###
    ### min.RSS <- function(data, par){                      
    ###     with(data, sum( ( (par[1]/x) + par[2] - y)^2))   
    ### } # end function definition
    ###
    ### result <- optim(par=c(0,1), fn=min.RSS, data=dat)    
    ### m <- result$par[1]
    ### b <- result$par[2]
    ###
    ### y.func <- m/x + b             
    ###
    ### x.inv <- (y.func-b)/m
    ###
    ### fit <- lm(y.func ~ x.inv) # This is the old code
    ###############################################################

    fit <- lm(y ~ I(1/x))    # revised version of this call

    predict(fit, interval="prediction", level=0.95)

###################
##for visualization

plot(x, y) 

points(x, predict(fit), pch=3, col="red")

和输出[带有旧/原始代码]：

         fit       lwr       upr
1  10.486813 10.486813 10.486813
2  10.010511 10.010511 10.010511
3  30.481423 30.481423 30.481423
4  21.882111 21.882111 21.882111
5  12.723167 12.723167 12.723167
6  13.225601 13.225601 13.225601
7   9.200808  9.200808  9.200808
8   8.666266  8.666266  8.666266
9  13.693084 13.693084 13.693084
10 11.504828 11.504828 11.504828
11 12.823355 12.823355 12.823355
12 13.855366 13.855366 13.855366
13 12.745157 12.745157 12.745157
14 11.202439 11.202439 11.202439
15  9.759222  9.759222  9.759222
16 10.726086 10.726086 10.726086
17 27.554335 27.554335 27.554335
18 23.471612 23.471612 23.471612
19 13.616798 13.616798 13.616798
20  9.658806  9.658806  9.658806
21  8.635468  8.635468  8.635468
22  9.051868  9.051868  9.051868
23  8.876202  8.876202  8.876202
24  8.921556  8.921556  8.921556
25 10.458890 10.458890 10.458890
26 10.464534 10.464534 10.464534
27  9.683169  9.683169  9.683169
28  9.448235  9.448235  9.448235
29  8.897182  8.897182  8.897182
30  8.810579  8.810579  8.810579
31 54.197799 54.197799 54.197799
32 36.415584 36.415584 36.415584
33  9.311154  9.311154  9.311154
34 10.058587 10.058587 10.058587
35  8.760547  8.760547  8.760547
36  8.528651  8.528651  8.528651
37 75.375577 75.375577 75.375577
38 30.063788 30.063788 30.063788
39 23.809605 23.809605 23.809605
40 24.158661 24.158661 24.158661
41  8.625791  8.625791  8.625791
42  8.491054  8.491054  8.491054
43  8.517823  8.517823  8.517823
44  8.449236  8.449236  8.449236
45 12.869890 12.869890 12.869890
46 11.958763 11.958763 11.958763
Warning message:
In predict.lm(fit, interval = "prediction", level = 0.95) :
  predictions on current data refer to _future_ responses

输出修改后的代码：

       fit       lwr      upr
1  10.485998 -6.563145 27.53514
2  10.009707 -7.045210 27.06462
3  30.480153 13.215811 47.74449
4  21.881036  4.818367 38.94371
5  12.722301 -4.306515 29.75112
6  13.224724 -3.801068 30.25052
7   9.200022 -7.865873 26.26592
8   8.665493 -8.408451 25.73944
9  13.692196 -3.331291 30.71568
10 11.503990 -5.534511 28.54249
11 12.822487 -4.205681 29.85066
12 13.854474 -3.168328 30.87728
13 12.744290 -4.284382 29.77296
14 11.201608 -5.839812 28.24303
15  9.758424 -7.299743 26.81659
16 10.725265 -6.321168 27.77170
17 27.553131 10.375640 44.73062
18 23.470501  6.382837 40.55817
19 13.615912 -3.407918 30.63974
20  9.658009 -7.401496 26.71751
21  8.634696 -8.439731 25.70912
22  9.051086 -8.016987 26.11916
23  8.875423 -8.195283 25.94613
24  8.920777 -8.149243 25.99080
25 10.458075 -6.591391 27.50754
26 10.463719 -6.585682 27.51312
27  9.682372 -7.376807 26.74155
28  9.447443 -7.614942 26.50983
29  8.896403 -8.173985 25.96679
30  8.809802 -8.261905 25.88151
31 54.195988 35.582661 72.80932
32 36.414178 18.917536 53.91082
33  9.310366 -7.753947 26.37468
34 10.057782 -6.996529 27.11209
35  8.759771 -8.312706 25.83225
36  8.527881 -8.548237 25.60400
37 75.373284 54.732823 96.01374
38 30.062528 12.811712 47.31334
39 23.808487  6.714784 40.90219
40 24.157535  7.057330 41.25774
41  8.625019 -8.449560 25.69960
42  8.490285 -8.586435 25.56700
43  8.517053 -8.559238 25.59334
44  8.448468 -8.628925 25.52586
45 12.869021 -4.158854 29.89690
46 11.957915 -5.076589 28.99242

And here's a graph without intervals, just the model fit

有什么想法为什么间隔为0？换句话说，为什么fit = upr = lwr？

谢谢！

Answer 1

您的结果是预测变量的线性变换。结果，残差的标准偏差为0，标准误差也为0。这意味着预测间隔的宽度为0。

$.body.firstname

如果您认为y = m / x + b，可以运行> summary(fit) Call: lm(formula = y.func ~ x.inv) Residuals: Min 1Q Median 3Q Max -4.199e-15 -1.082e-16 -8.320e-17 4.960e-17 8.436e-15 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.044e+00 2.965e-16 2.713e+16 <2e-16 *** x.inv 1.274e+02 2.551e-15 4.992e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.725e-15 on 44 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.492e+33 on 1 and 44 DF, p-value: < 2.2e-16 Warning message: In summary.lm(fit) : essentially perfect fit: summary may be unreliable吗？

R，predict（）函数，为什么计算的间隔为0？（即fit = upper = lower）

1 个答案:

R，predict（）函数，为什么计算的间隔为0？ （即fit = upper = lower）

1 个答案:

R，predict（）函数，为什么计算的间隔为0？（即fit = upper = lower）