R - 使用nnetar进行预测时的预测间隔较窄

时间:2018-04-30 10:43:05

标签: r time-series forecasting

我正在尝试使用预测包中的nnetar函数构建预测。 与其他方法相比,我得到了一些相当不错的预测,但我遇到的问题是它产生非常窄的预测间隔。

我尝试预测的数据集是来自电子商务的每周收入数据,转化率和adspend作为解释性x变量(Xreg)

这就是我的预测结果: enter image description here 这是我用来制作它的代码:

fit_test <- nnetar(total_revenue_ts, size = 5, repeats = 200, xreg = xreg)
fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, h=26)
autoplot(fit_test_fc) + autolayer(test_rev_ts$total)

这是我使用的数据:

total_revenue_ts <- structure(c(429527.84912, 5107534.789265, 5334742.992202, 7062236.076739, 
7376937.2329, 8843885.679834, 10312889.001099, 4743025.186331, 
1063820.467744, 8647610.570788, 7615849.910587, 6950888.592508, 
6858879.08066, 7207686.138817, 6552543.847104, 6286320.862515, 
6387758.212579, 6267651.456223, 6166523.577491, 6517987.757523, 
4032163.322867, 6774882.672302, 7280882.606489, 7042888.802793, 
5864325.907872, 7614073.472534, 5702820.168872, 5993043.498666, 
5748712.530684, 5781854.779949, 6514731.488613, 6200435.741256, 
6716691.630149, 5671091.670532, 6849896.078633, 6412725.445233, 
5820498.437424, 5140661.371894, 5543105.774292, 6498649.993838, 
6832579.992745, 6363471.54561, 5764986.861829, 6479827.767348, 
6082916.613222, 5654806.062709, 6250723.443025, 7021696.610899, 
6878521.38167, 6605964.840134, 5860880.924163, 6027383.028358, 
7271275.876805, 5788375.978398, 5952319.104294, 8700792.56985, 
9387497.556219, 10628335.699833, 12300448.541447, 7624816.545391, 
8041602.838183, 7340912.745611, 6475830.912185, 6511598.406008, 
7670675.084654, 6597851.103698, 5992838.357045, 5782002.308393, 
7591927.838791, 6316308.891923, 6024260.46223, 6099526.226113, 
5341138.559686, 5959177.962052, 4614361.675905, 5649334.049846, 
6774789.19439, 7823320.381864, 5941416.816392, 6576822.658397, 
4949544.168466, 6394315.633561, 5432101.434962, 5971872.77196, 
6375234.021085, 6776885.612781, 6381300.2023, 5376238.120971, 
4654630.262986, 5404870.534346, 6616177.722868, 6627152.023493, 
6566693.385556, 6687236.645467, 6473086.938295, 5478904.979073, 
5884130.390298, 6219789.15664), .Tsp = c(2015.84615384615, 2017.71153846154, 
52), class = "ts")

xreg <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 5723.69, 5528.78, 6099.31, 13001.39, 6750.07, 
6202.91, 6685.01, 5020, 5094.73, 2714.07, 9645.9, 8208.18, 6297.5, 
8053.29, 0, 4418.27, 9393.52, 11139.19, 12678.08, 12493.18, 11242.28, 
9617.09, 6959.37, 11716.52, 8464.61, 1499.14, 14538.86, 12080.69, 
11905.71, 14405.72, 9077.05, 10362.49, 13776.75, 17620.9, 14767.2, 
19511.98, 19747.72, 19885.44, 16810.46, 10618.04, 7494.02, 8166.45, 
7503.29, 7955.54, 7971.87, 14520.84, 19219.74, 18824.67, 27216.48, 
32030.82, 32007.76, 24153.88, 20472.33, 17617.01, 4.77806579193501, 
5.7287751461365, 5.28098389402001, 5.02434789173682, 4.95184840012426, 
5.64277441770686, 5.37984870432963, 5.3906432267007, 5.43849275673932, 
5.6884135855546, 5.2709838799333, 5.41075942817439, 4.94887009216008, 
4.95521307717827, 5.62734185585188, 5.51042637235732, 5.29807054480431, 
5.52756845275268, 5.70969961018115, 5.54781801299907, 5.73014260988972, 
5.99759204482959, 6.22750289793179, 5.93356463634027, 5.69127817622951, 
5.57154841999638, 5.66114857960352, 5.72923212265929, 5.31293510374571, 
5.35736716492903, 5.65568332596196, 5.74619318262752, 5.5954764989987, 
5.34701430785202, 5.38617886178862, 6.0341348094332, 5.46323395671082, 
5.33899929707969, 5.22135801253651, 5.65190410869423, 5.28112320474013, 
4.80649483723496, 4.81842452314323, 5.00675102835432, 4.49345845605863, 
3.82212461085761, 4.62551440329218, 3.79930173346953, 5.71101883613167, 
6.40135958079592, 7.1027311558873, 4.0456548762572, 4.86275624624909, 
3.68451118002285, 5.40269725877529, 5.24419134903069, 5.0344951706761, 
4.89131058216232, 5.63214154072982, 5.52286515754452, 4.99781361730586, 
5.09012974090091, 5.43346256247373, 5.20251523559131, 5.25889558131295, 
4.17869474160865, 5.59036205822923, 5.33376848927069, 5.38868363783592, 
5.43341024859593, 5.19857108205253, 5.19137882047327, 5.23814895237021, 
5.01957530659338, 5.48137535816619, 5.67523044227311, 5.26029025707068, 
5.18449109254837, 5.24915583751151, 5.45151430953043, 5.34584086799277, 
4.97336938233212, 5.22618004090631, 5.52619366814479, 5.70389182510811, 
5.75578084064244, 5.53339664450776, 5.16303263313334, 5.88409835642594, 
5.56461936196381, 5.20891730381574, 5.21675833063733, 5.30279468609766, 
5.22628072593614, 4.77056025260184, 4.72482746416563, 4.68623694730198, 
5.07214098963881), .Dim = c(98L, 2L), .Dimnames = list(NULL, 
    c("adCost", "transactionsPerSession")), .Tsp = c(2015.84615384615, 
2017.71153846154, 52), class = c("mts", "ts", "matrix"))

xreg_test <- structure(c(17617.01, 13526.88, 14836.89, 20358.16, 20416.79, 
21635.72, 15456.3, 12569.27, 18673, 20591.58, 18922.52, 19658.27, 
21371.37, 20921.06, 18846.68, 17315.48, 18569.47, 20276.32, 17932.33, 
18405.48, 17566.76, 15605.29, 18694.58, 17082.73, 18291.26, 18211.78, 
18252.98, 5.07214098963881, 4.9644513137558, 4.50735617759714, 
3.42940249666707, 5.57244242550868, 6.85297018333131, 8.27499041424656, 
5.64773791252811, 4.17746355274814, 4.78132627344352, 4.5212649754887, 
4.16629173040583, 3.95132622368061, 4.2603550295858, 4.07247936849659, 
3.98828918165935, 3.8364837584878, 4.32967453511229, 4.10479719434903, 
3.88986772076209, 3.89750505731625, 4.02224223511425, 4.23119830350054, 
3.54885240337703, 4.05530730967035, 4.46043036568541, 4.59654125314768
), .Dim = c(27L, 2L), .Dimnames = list(NULL, c("adCost", "transactionsPerSession"
)), .Tsp = c(2017.71153846154, 2018.21153846154, 52), class = c("mts", 
"ts", "matrix"))

test_rev_ts$total <- structure(c(6219789.15664, 6207675.91913, 5375609.354946, 5970907.816396, 
4905889.954914, 6003436.003269, 6311734.743992, 5771009.21678, 
5284469.645259, 7228321.956032, 7070364.421462, 8978263.238038, 
11173150.908703, 8212310.181272, 5336736.750351, 6918492.690826, 
7807812.156676, 7025220.106499, 6539795.925754, 6734049.267568, 
6736165.004623, 5775402.314813, 6083716.578991, 6441420.211984, 
6269669.541568, 4968476.314634, 11122809.394872), .Tsp = c(2017.71153846154, 
2018.21153846154, 52), class = "ts")

如果有人能解释为什么我会得到如此狭窄的预测间隔以及如何解决它,我真的很感激。

1 个答案:

答案 0 :(得分:3)

为什么预测间隔如此狭窄?

默认情况下,nnetar会将 in-sample 残差中的信息用于预测间隔中使用的创新。根据模型的复杂程度,残差可以任意小。 documentation gives this warning

  

请注意,如果网络太复杂并且数据过多,那么   残差可以任意小;如果用于预测间隔   计算,它们可能导致误导性的小值。

与此相关,您的时间序列有98个点,模型有31个参数。此外,数据的季节性周期为52,当使用季节性滞后时,您实际上只有46个数据点适合。

作为参考,nnetar残差的标准偏差大约是auto.arima残差的4倍。

如何处理窄预测间隔?

有几种可能性。为了加快这些示例的计算,我从您的示例中减少了拟合模型的数量(到repeats = 50)和PI模拟的数量(到npaths = 50)。要考虑这些变化和RNG的影响,请考虑以下模型作为基线:

set.seed(1234)
fit_test <- nnetar(total_revenue_ts, size = 5, repeats = 50, xreg = xreg)
fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50)
autoplot(fit_test_fc) + autolayer(test_rev_ts)

enter image description here

  • forecast使用

    提供更好的创新

    这些会影响间隔,但平均预测将保持不变。

    • 手动设置创新

      如果您对更合适的创新有一些外部知识 使用时,您可以通过innov参数提供它们 预测。

      例如,说你碰巧知道标准偏差 这些创新应该比真正的大3倍 残差显示。然后你可以这样做:

      set.seed(1234) fit_test <- nnetar(total_revenue_ts, size = 5, repeats
      = 50, xreg = xreg)
      ## Set up new innovations for PI 
      res_sd <- sd(residuals(fit_test), na.rm=T) 
      myinnovs <- rnorm(nrow(xreg_test)*50, mean=0, sd=res_sd*3)
      ## fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50, innov = myinnovs) 
      autoplot(fit_test_fc) + autolayer(test_rev_ts)
      

      enter image description here

    • 使用样本外的值

      您可以通过使用样本外来估算更好的创新 残差而不是样本内残差。中的subset参数 nnetar允许您仅适合部分数据。你也可以用 用于交叉验证的CVar函数并从中获取残差 那里。这是使用后者的一个例子:

      set.seed(1234) 
      fit_test <- nnetar(total_revenue_ts, size = 5, repeats = 50, xreg = xreg)
      ## Set up new innovations for PI 
      fit_test_cv < CVar(total_revenue_ts,  size = 5, repeats = 50, xreg = xreg) 
      res_sd <- sd(fit_test_cv$residuals, na.rm=T) 
      myinnovs <- rnorm(nrow(xreg_test)*50, mean=0, sd=res_sd)
      ## 
      fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = reg_test, npaths = 50, innov = myinnovs) 
      autoplot(fit_test_fc) + autolayer(test_rev_ts)
      

      enter image description here

  • 控制过度拟合

    除预测间隔外,这些修改还会影响您的模型,因此与基线相比,平均预测会发生变化。

    • 放弃季节性延迟

      您的数据大约有2个季节性时段。由于模型需要拟合和预测的滞后值,因此使用季节性滞后会使您失去很大一部分。你可以删除季节性组件,可能会增加额外的滞后来补偿。在下面的例子中,由于有更多滞后,我将参数数量增加到36,但由于没有季节性滞后而“获得”49分。

      set.seed(1234)
      fit_test <- nnetar(total_revenue_ts, p=3, P=0, size = 5, repeats = 50, xreg = xreg)
      fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50)
      autoplot(fit_test_fc) + autolayer(test_rev_ts)
      

      enter image description here

    • 降低模型复杂度

      如前所述,有5个神经元,你有31个参数。例如,将该数字删除到size=2会将参数数量减少到13。

      set.seed(1234)
      fit_test <- nnetar(total_revenue_ts, size = 2, repeats = 50, xreg = xreg)
      fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50)
      autoplot(fit_test_fc) + autolayer(test_rev_ts)
      

      enter image description here

    • 使用正规化

      为了弥补模型的复杂性,我们可以使用decay中的nnet参数进行正则化。

      set.seed(1234)
      fit_test <- nnetar(total_revenue_ts, size = 5, repeats = 50, xreg = xreg, decay = 1)
      fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50)
      autoplot(fit_test_fc) + autolayer(test_rev_ts)
      

      enter image description here

底线

如果适合您的使用,也可以将这些选项中的一些组合使用,但最重要的是要记住这些是复杂的模型,而且只有大约100个数据点可以做到这一点。

以下是正则化模型与样本外残差的组合:

set.seed(1234)
fit_test <- nnetar(total_revenue_ts, size = 5, repeats = 50, xreg = xreg, decay = 0.1)
## Set up new innovations for PI
fit_test_cv <- CVar(total_revenue_ts,  size = 5, repeats = 50, xreg = xreg, decay = 0.1)
res_sd <- sd(fit_test_cv$residuals, na.rm=T)
myinnovs <- rnorm(nrow(xreg_test)*50, mean=0, sd=res_sd)
##
fit_test_fc <- forecast(fit_test, PI=TRUE , xreg = xreg_test, npaths = 50, innov = myinnovs)
autoplot(fit_test_fc) + autolayer(test_rev_ts)

enter image description here

**请注意,在示例中,我假设创新的正态分布并且仅改变标准差,但是当通过innov参数手动添加它们时,它们也可以遵循任何其他任意分布。 / p>