我对我为项目所做的时间序列预测工作有疑问。
由于时间序列具有季节性影响,因此我正在使用SARIMA(季节性ARIMA)进行模型预测。在SARIMA中,我们需要传递参数-p,d,q,P,D,Q和m。虽然可以使用自动Arima自动选择大多数这些参数,但术语“ m”定义为一年中观察到的季节性周期数(每周季节性-52,每月季节性-12等),这是我们必须手动设置的术语提供。
我们究竟如何确定这个名词?根据我的时间序列,该序列的周期似乎略有不当,即有时每两周重复一次,有时在两个月内重复一次。我们每周收集一次数据集(2年和3个月的数据共有363个数据点)。
由于选择“ m”时存在混淆,我们将其保留为52(因为每周都有数据点可用),并获得了不错的MAPE(大约9)。但是,当我将其增加到80时,MAPE进一步减小到3,并且预测的图以更好的方式跟随实际图。当我将其增加到80以上时,代码引发了一些值错误。
有人知道为什么会这样吗?
我增加了测试数据集,然后增加了m的值,这很有用,但是m的值越高不一定表示MAPE值越低。 m的值80给出较低的MAPE,而值81和85给出较高的值。这似乎是随机的,但是我确信m的值对于给出预测模式非常重要。我在这里附上图片以更好地理解。
ValueError Traceback (most recent call last)
<ipython-input-90-8517b4596f58> in <module>
----> 1 model_fit = train_auto_arima(train_df1.DAT_RATE)
2 model_fit.fit(train_df1.values)
3 print(f'Params - > \n aic-{model_fit.aic()}, \n get_params-{model_fit.get_params()}')
4 # forecasting
5 test_predictions = forecast_over_test_set(model_fit, test_df1.DAT_RATE, train_df1.DAT_RATE)
<ipython-input-89-218545403b7c> in train_auto_arima(df)
81 stepwise=False, # We are going with Parallel execution rather than step-wise approach
82 information_criterion='bic',
---> 83 trace=True, error_action='ignore')
84
85 return arima
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\auto.py in auto_arima(y, exogenous, start_p, d, start_q, max_p, max_d, max_q, start_P, D, start_Q, max_P, max_D, max_Q, max_order, m, seasonal, stationary, information_criterion, alpha, test, seasonal_test, stepwise, n_jobs, start_params, trend, method, transparams, solver, maxiter, disp, callback, offset_test_args, seasonal_test_args, suppress_warnings, error_action, trace, random, random_state, n_fits, return_valid_fits, out_of_sample_size, scoring, scoring_args, with_intercept, sarimax_kwargs, **fit_args)
320 if seasonal_test_args is not None else dict()
321 D = nsdiffs(xx, m=m, test=seasonal_test, max_D=max_D,
--> 322 **seasonal_test_args)
323
324 if D > 0 and exogenous is not None:
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\utils.py in nsdiffs(x, m, max_D, test, **kwargs)
105
106 D = 0
--> 107 dodiff = testfunc(x)
108 while dodiff == 1 and D < max_D:
109 D += 1
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\seasonality.py in estimate_seasonal_differencing_term(self, x)
456
457 # Get the critical value for m
--> 458 stat = self._compute_test_statistic(x)
459 crit_val = self._calc_ocsb_crit_val(self.m)
460 return int(stat > crit_val)
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\seasonality.py in _compute_test_statistic(self, x)
417 # Compute the actual linear model used for determining the test stat
418 try:
--> 419 regression = self._fit_ocsb(x, m, maxlag, maxlag)
420 except np.linalg.LinAlgError: # Singular matrix
421 if crit_regression is not None:
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\seasonality.py in _fit_ocsb(x, m, lag, max_lag)
341 y_first_order_diff = diff(x, m)
342 y = diff(y_first_order_diff)
--> 343 ylag = OCSBTest._gen_lags(y, lag)
344
345 if max_lag > 0:
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\seasonality.py in _gen_lags(y, max_lag, omit_na)
334
335 # delegate down
--> 336 return OCSBTest._do_lag(y, max_lag, omit_na)
337
338 @staticmethod
~\AppData\Roaming\Python\Python37\site-packages\pmdarima\arima\seasonality.py in _do_lag(y, lag, omit_na)
319 # Create a 2d array of dims (n + (lag - 1), lag). This looks cryptic..
320 # If there are tons of lags, this may not be super efficient...
--> 321 out = np.ones((n + (lag - 1), lag)) * np.nan
322 for i in range(lag):
323 out[i:i + n, i] = y
C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\numeric.py in ones(shape, dtype, order)
221
222 """
--> 223 a = empty(shape, dtype, order)
224 multiarray.copyto(a, 1, casting='unsafe')
225 return a
ValueError: negative dimensions are not allowed