我有一个时间序列数据。它每周一次。
我想使用ARIMA模型预测接下来几周的数据。
这是我的时间序列数据的图表:
首先,我使用统计模型中的seasonal_decompose方法来检查趋势/会话性/残差外观:
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(df['comissions'], model='add')
result.plot();
我检查我的数据是否稳定:
from statsmodels.tsa.stattools import adfuller
def adf_test(series,title=''):
"""
Pass in a time series and an optional title, returns an ADF report
"""
print(f'Augmented Dickey-Fuller Test: {title}')
result = adfuller(series.dropna(),autolag='AIC') # .dropna() handles differenced data
labels = ['ADF test statistic','p-value','# lags used','# observations']
out = pd.Series(result[0:4],index=labels)
for key,val in result[4].items():
out[f'critical value ({key})']=val
print(out.to_string()) # .to_string() removes the line "dtype: float64"
if result[1] <= 0.05:
print("Strong evidence against the null hypothesis")
print("Reject the null hypothesis")
print("Data has no unit root and is stationary")
else:
print("Weak evidence against the null hypothesis")
print("Fail to reject the null hypothesis")
print("Data has a unit root and is non-stationary")
adf_test(df['n_transactions'])
Augmented Dickey-Fuller Test:
ADF test statistic -3.857922
p-value 0.002367
# lags used 12.000000
# observations 737.000000
critical value (1%) -3.439254
critical value (5%) -2.865470
critical value (10%) -2.568863
Strong evidence against the null hypothesis
Reject the null hypothesis
Data has no unit root and is stationary
我使用auto_arima来获取模型的最佳参数:
from pmdarima import auto_arima
auto_arima(df['comissions'],seasonal=True, m = 7).summary()
train = df.loc[:'2020-04-26']
test = df.loc['2020-05-03':]
model = SARIMAX(train['n_transactions'],order=(1, 1, 1))
results = model.fit()
results.plot_diagnostics(figsize=(16, 8))
plt.show()
我计算预测:
start=len(train)
end=len(train)+len(test)-1
predictions = results.predict(start=start, end=end, dynamic=False, typ='levels').rename('SARIMA(0,1,3)(1,0,1,12) Predictions')
ax = test['n_transactions'].plot(legend=True,figsize=(12,6),title=title)
predictions.plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);
然后,我想看看我的模型如何预测下周。每周我都会再次训练模型,因为对我而言,只有下个月才更重要。
all_df = pd.DataFrame()
test = df.iloc[-20:]
i = -20
for index, row in test.iterrows():
train = df.iloc[:i:]
model = SARIMAX(train['comission'],order=(1, 1, 1))
results = model.fit()
start = len(train)
end = len(train) +1
predictions = results.predict(start=start, end=end, dynamic=False, typ='levels')
test_df = pd.DataFrame(columns = {'predictions'})
test_df['predictions'] = predictions
all_df = pd.concat([all_df, test_df], axis=0, sort=False)
i += 1
ax = test['comission'].plot(legend=True,figsize=(12,6),title=title)
all_df['predictions'].plot(legend=True)
ax.autoscale(axis='x',tight=True)
ax.set(xlabel=xlabel, ylabel=ylabel);
我的ARIMA模型无法更精确的可能原因是什么?
我必须如何分析result.plot_diagnostics?