观察之间的插值(分段近似)R

时间:2018-02-26 18:41:32

标签: r interpolation

我正在将一些预测数据与实际值进行比较。预测来自三个不同的提供者。但是,实际数据和预测数据的时间戳不一样。我想比较预测的每个点的误差。

在下面的快照中,我想从实际值中得出每个提供商预测的预测差异。圈出的点代表实际数据不可用的预测,但我们可以看到有明显的趋势。我想我可以通过分段近似来表达,但我不知道该怎么做。我已经在Need a R package for piecewise linear regression?中看到了答案,但这并不是很有帮助。

10天样本: enter image description here

显示偏移黑白预测实例和实际数据的1天样本: enter image description here

样本数据(1天)

> dput(dt)
structure(list(tme = structure(c(1516221000, 1516224600, 1516228200, 
1516231800, 1516235400, 1516239000, 1516242600, 1516246200, 1516249800, 
1516253400, 1516257000, 1516260600, 1516264200, 1516267800, 1516271400, 
1516275000, 1516278600, 1516282200, 1516285800, 1516289400, 1516293000, 
1516296600, 1516300200, 1516303800, 1516307400, 1516226400, 1516230000, 
1516233600, 1516237200, 1516240800, 1516244400, 1516248000, 1516251600, 
1516255200, 1516258800, 1516262400, 1516266000, 1516269600, 1516273200, 
1516276800, 1516280400, 1516284000, 1516287600, 1516291200, 1516294800, 
1516298400, 1516302000, 1516305600, 1516221000, 1516224600, 1516228200, 
1516231800, 1516235400, 1516239000, 1516242600, 1516246200, 1516249800, 
1516253400, 1516257000, 1516260600, 1516264200, 1516267800, 1516271400, 
1516275000, 1516278600, 1516282200, 1516285800, 1516289400, 1516293000, 
1516296600, 1516300200, 1516303800, 1516307400, 1516233600, 1516244400, 
1516255200, 1516266000, 1516276800, 1516287600, 1516298400), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), degc = c(2.25, 1.69, 2.22, 2.22, 1.65, 1.12, 2.22, 
1.1, 1.13, 2.82, 5.58, 7.8, 7.85, 8.43, 10.05, 10.06, 10.07, 
10.03, 8.89, 6.17, 5.04, 5.01, 3.92, 2.29, 2.29, -1, -1, -1, 
-1, -1, 0, 1, 2, 4, 6, 7, 8, 8, 9, 9, 9, 7, 6, 4, 3, 2, 2, 1, 
-0.16, -1.13, -2.19, -2.98, -3.48, -3.86, -3.84, -2.96, -1.16, 
0.91, 2.61, 3.92, 4.84, 5.59, 6.68, 7.41, 6.82, 5.08, 3.07, 1.56, 
0.51, -0.36, -1.15, -1.86, -2.53, -0.2, -0.9, 4.1, 6.9, 8.1, 
3.6, 2.6), rh = c(0.55, 0.6, 0.51, 0.51, 0.6, 0.52, 0.55, 0.57, 
0.6, 0.49, 0.44, 0.41, 0.38, 0.36, 0.33, 0.33, 0.31, 0.33, 0.35, 
0.39, 0.4, 0.4, 0.43, 0.49, 0.49, 73, 73, 75, 75, 75, 71, 67, 
59, 52, 47, 42, 39, 37, 35, 34, 37, 43, 48, 51, 54, 58, 61, 62, 
0.61, 0.64, 0.67, 0.7, 0.72, 0.74, 0.74, 0.71, 0.65, 0.58, 0.54, 
0.52, 0.51, 0.5, 0.46, 0.44, 0.45, 0.5, 0.57, 0.61, 0.64, 0.65, 
0.67, 0.69, 0.71, 59.1, 62.6, 43.9, 36.7, 33.2, 46.4, 50.1), 
    type = c("Actual", "Actual", "Actual", "Actual", "Actual", 
    "Actual", "Actual", "Actual", "Actual", "Actual", "Actual", 
    "Actual", "Actual", "Actual", "Actual", "Actual", "Actual", 
    "Actual", "Actual", "Actual", "Actual", "Actual", "Actual", 
    "Actual", "Actual", "Provider W", "Provider W", "Provider W", 
    "Provider W", "Provider W", "Provider W", "Provider W", "Provider W", 
    "Provider W", "Provider W", "Provider W", "Provider W", "Provider W", 
    "Provider W", "Provider W", "Provider W", "Provider W", "Provider W", 
    "Provider W", "Provider W", "Provider W", "Provider W", "Provider W", 
    "Provider D", "Provider D", "Provider D", "Provider D", "Provider D", 
    "Provider D", "Provider D", "Provider D", "Provider D", "Provider D", 
    "Provider D", "Provider D", "Provider D", "Provider D", "Provider D", 
    "Provider D", "Provider D", "Provider D", "Provider D", "Provider D", 
    "Provider D", "Provider D", "Provider D", "Provider D", "Provider D", 
    "Provider B", "Provider B", "Provider B", "Provider B", "Provider B", 
    "Provider B", "Provider B")), .Names = c("tme", "degc", "rh", 
"type"), row.names = c(NA, -80L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x0000000000120788>)

我真的不确定如何继续这个。我需要为多个数据集(每行几百行)重复此练习,最多包含30个变量(样本数据只有两个)。

1 个答案:

答案 0 :(得分:0)

认为你要求的是这个。

fAct = approxfun(dt$tme[dt$type=='Actual'], dt$degc[dt$type=='Actual'], )

这给出了实际值的分段线性近似。然后,您可以将其与各种提供商的值进行比较。例如,

> dt[35,]
                   tme degc rh       type
35 2018-01-18 07:00:00    6 47 Provider W
> fAct(dt[35,'tme'])
[1] 6.69

因此,提供者W预测在2018-01-18 07:00:00时degc将为6。 (近似值)实际值为6.69,因此误差为0.69。

修改

如@RalfStubner所述,您可以使用

获得更平滑(非线性)的近似值
fAct2 = splinefun(dt$tme[dt$type=='Actual'], dt$degc[dt$type=='Actual'])