我正在使用SARIMAX模型在Python中建立每周现金流量预测,但对结果不满意。我正在使用自动机来查找SARIMA的最佳订单和季节性订单。我有过去5年以上的数据,足以建立一个好的模型。我的数据看起来像是附件Historical Data 分解为freq = 7的结果如下statsmodel decompostion
最佳模型:SARIMAX(1,0,1)(2,1,0)[52]
Forecast Result 我们的预测的均方误差为 4625364095.19 我们的预测的均方根误差为 68010.03
RMSE太高,因此需要寻求帮助以改善模型性能。快速响应。
我的代码如下:
actual = [35592.63, 111814.61, 164527.43, 136719.53, 130048.37, 66672.31, 151650.05, 98633.68, 218984.49, 32640.38, 119842.40, 114052.16, 78411.80]
dt = pd.date_range("20140113","20200608", freq='W-MON')
df2 = pd.read_csv('mse_ar_data.csv')
df2.index=dt
df2 = np.ceil(df2)
df2
stepwise_fit = auto_arima(df2, start_p = 1, start_q = 1,
max_p = 5, max_q = 5, m = 52,
start_P = 0, seasonal = True,
d = None, D = 1, trace = True,
error_action ='ignore', # we don't want to know if an order does not work
suppress_warnings = True, # we don't want convergence warnings
stepwise = True)
stepwise_fit.summary()
model = sm.tsa.statespace.SARIMAX(df2,
order=(1, 0, 1),
seasonal_order=(2, 1, 0, 52),
enforce_stationarity=False,
enforce_invertibility=False)
results_ar = model.fit()
print(results_ar.summary().tables[1])
#Diagnostic Plot
results_ar.plot_diagnostics(figsize=(16, 8))
plt.show()
#Prediction
pred_ar = results_ar.get_prediction(start=pd.to_datetime('2020-03-02'), dynamic=False)
pred_ar_ci = pred_ar.conf_int()
ax = df2['2016-01':].plot(label='observed')
pred_ar.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7, figsize=(14, 7))
ax.fill_between(pred_ar_ci.index,
pred_ar_ci.iloc[:, 0],
pred_ar_ci.iloc[:, 1], color='k', alpha=.2)
ax.set_xlabel('Date')
ax.set_ylabel('Cash Inflow AR')
plt.legend()
plt.show()
y_forecasted = pred_ar.predicted_mean
y_truth = df2['2020-03-02':]['ar_amount']
mse = ((y_forecasted - y_truth) ** 2).mean()
print('\nThe Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
print('The Root Mean Squared Error of our forecasts is {}'.format(round(np.sqrt(mse), 2)))
forcast_ar = pd.DataFrame({'Actual':actual, 'Forecasted':pred_ar_uc.predicted_mean})
forcast_ar = forcast_ar.round(2)
forcast_ar['Delta'] = forcast_ar['Forecasted']-forcast_ar['Actual']
print(forcast_ar)
total_delta = round(np.abs(forcast_ar.Delta).sum(),2)
avg_delta = round(np.abs(forcast_ar.Delta).mean(),2)
print('\nTotal Delta:',total_delta)
print('Average Delta:',avg_delta)