这是我在这个非常有用的平台上的第一篇文章。我是时间序列建模的初学者。我正在尝试开发用于单变量时间序列预测的 SARIMAX 模型。我有一个设备的两年的每日工作时间数据,我将其重新采样为每周数据。我想预测此设备的未来运行时间(未来16周)。
我尝试了本文所述的网格搜索算法: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3 识别模型的超级参数。
Dickey-fuller测试表明数据是固定的。以下是打印结果(每周重新采样): 迪基-富勒测试的结果:
Test Statistic -6.651852e+00
p-value 5.097401e-09
#Lags Used 0.000000e+00
Number of Observations Used 7.300000e+01
Critical Value (1%) -3.523284e+00
Critical Value (5%) -2.902031e+00
Critical Value (10%) -2.588371e+00
dtype: float64
我的模型摘要如下所示:
Statespace Model Results
==========================================================================================
Dep. Variable: duration No. Observations: 74
Model: SARIMAX(1, 0, 0)x(1, 1, 0, 26) Log Likelihood -53.441
Date: Wed, 17 Jul 2019 AIC 112.881
Time: 16:43:37 BIC 116.015
Sample: 0 HQIC 113.561
- 74
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
ar.L1 0.2311 0.221 1.044 0.296 -0.203 0.665
ar.S.L26 -0.3097 0.252 -1.228 0.220 -0.804 0.185
sigma2 9.5039 2.397 3.965 0.000 4.806 14.202
===================================================================================
Ljung-Box (Q): 13.44 Jarque-Bera (JB): 7.02
Prob(Q): 0.86 Prob(JB): 0.03
Heteroskedasticity (H): 3.84 Skew: -0.60
Prob(H) (two-sided): 0.10 Kurtosis: 5.56
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
以下是建模代码:
mod = sm.tsa.statespace.SARIMAX(df_train,
order=(1, 0, 0),
seasonal_order=(1,1,0,26),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit(disp=False)
pred = results.get_forecast(steps= len(df_test))
该预测似乎已推迟2周。我已经把结果附在这篇文章上了。 这是预测值偏移的结果:
dh = dh.shift(-2).dropna()
Shifted forecast values in red
Image showing forecast and actual values of operating hours. Red is forecasted, blue is actual data 有人可以弄清楚我的方法是否正确,并解释为什么预测结果会偏离两周(不过季节性因素相差一星期)?
ps:在研究了季节性分解图后,我选择了26作为季节性分量。 这是用于澄清的测试数据:
date duration
8/7/2016 14.75865079
8/14/2016 15.72940476
8/21/2016 16.12214286
8/28/2016 14.3756746
9/4/2016 14.90861111
9/11/2016 15.34690476
9/18/2016 16.15107143
9/25/2016 15.98257937
10/2/2016 8.374642857
10/9/2016 15.12717593
10/16/2016 15.91464286
10/23/2016 15.8356746
10/30/2016 16.75575397
11/6/2016 14.32138889
11/13/2016 15.60551587
11/20/2016 16.24988095
11/27/2016 15.95936508
12/4/2016 14.61742063
12/11/2016 13.545
12/18/2016 17.02488095
12/25/2016 9.159555556
1/8/2017 12.81242063
1/15/2017 16.20285714
1/22/2017 17.0834127
1/29/2017 18.40464286
2/5/2017 13.39559524
2/12/2017 16.36452381
2/19/2017 16.67698413
2/26/2017 15.62789683
3/5/2017 17.31428571
3/12/2017 17.40829365
3/19/2017 15.82539683
3/26/2017 15.21595238
4/2/2017 16.4109127
4/9/2017 11.38543651
4/16/2017 11.46966667
4/23/2017 13.79509259
4/30/2017 16.13079365
5/7/2017 14.43949074
5/14/2017 14.25813492
5/21/2017 15.21011905
5/28/2017 15.13231481
6/4/2017 13.35690476
6/11/2017 11.24513889
6/18/2017 16.33047619
6/25/2017 15.20654762
7/2/2017 13.08047619
7/9/2017 15.07047619
7/16/2017 16.03702381
7/23/2017 14.91428571
7/30/2017 13.3331746
8/6/2017 13.09619048
8/13/2017 14.51670635
8/20/2017 15.48579365
8/27/2017 10.42162698
9/3/2017 14.43809524
9/10/2017 15.2334127
9/17/2017 14.91301587
9/24/2017 14.6190873
10/1/2017 15.05559524
10/8/2017 16.16888889
10/15/2017 10.23011905
10/22/2017 14.50650794
10/29/2017 16.0815873
11/5/2017 13.52162037
11/12/2017 13.93670635
11/19/2017 14.02361111
11/26/2017 14.46198413
12/3/2017 14.57138889
12/10/2017 15.00194444
12/17/2017 6.562777778
12/24/2017 9.812314815
12/31/2017 9.812314815
1/7/2018 12.87944444
1/14/2018 15.5634127
1/21/2018 16.02464286
1/28/2018 14.96492063
2/4/2018 16.66015873
2/11/2018 11.89059524
2/18/2018 14.45646825
2/25/2018 14.84785714
3/4/2018 15.39595238
3/11/2018 14.02646825
3/18/2018 16.09496032
3/25/2018 14.69738095
4/1/2018 9.777777778
4/8/2018 13.21705556
4/15/2018 15.90865079
4/22/2018 16.01595238
4/29/2018 16.88354167
谢谢!
答案 0 :(得分:0)
我认为您的代码很好,并且您的模型预测是正确的,而不是2周之内,就像那样,因为在黑客入侵参数后,这是“随机”结果...;-)
但是我认为您的模型本身就是问题所在。您如何精确选择参数(p,d,q)(P,D,Q)s
?您的数据似乎是季节性的/有月度周期,因此您可能应该将s
参数保留在12
(就像在docs中建议的那样)。
我对其他参数进行了网格搜索,并通过均方根误差(from statsmodels.tools.eval_measures import rmse
)对其进行了评估。
最好的结果是:
mod = sm.tsa.statespace.SARIMAX(df_train,
order=(2, 1, 0),
seasonal_order=(1,1,1,12),
enforce_stationarity=False,
enforce_invertibility=False)
但是该模型不是最佳模型,您可能需要更多数据才能获得更好的模型(或尝试其他算法)。