我有按月总销售额的日期集(日期设置为索引):
SumSalesValue
Date
2011-01-01 7746056
2011-02-01 6585410
2011-03-01 7065595
2011-04-01 7322146
2011-05-01 8341621
2011-06-01 7707603
2011-07-01 8899279
2011-08-01 8209745
2011-09-01 7714118
2011-10-01 8957886
2011-11-01 8008410
2011-12-01 9697578
2012-01-01 9586926
2012-02-01 8264172
2012-03-01 8435335
2012-04-01 8209244
2012-05-01 9909858
2012-06-01 8428824
2012-07-01 9864037
2012-08-01 8952514
2012-09-01 9030655
2012-10-01 10579182
2012-11-01 9706230
2012-12-01 9939929
2013-01-01 11493645
2013-02-01 10369875
2013-03-01 10760833
2013-04-01 10408647
2013-05-01 12220684
2013-06-01 11059714
2013-07-01 12903194
2013-08-01 11368418
2013-09-01 11231536
2013-10-01 12682956
2013-11-01 11331284
2013-12-01 11410860
我使用季节性ARIMA算法来创建2014年的预测。我正在通过Python完成这个项目。我已经有了使数据静止的步骤,并且我已经解决了最佳参数p,d和q:
def evaluate_arima_model(X, arima_order):
# prepare training dataset
train_size = int(len(X) * 0.66)
train, test = X[0:train_size], X[train_size:]
history = [x for x in train]
# make predictions
predictions = list()
for t in range(len(test)):
model = ARIMA(history, order=arima_order)
model_fit = model.fit(disp=0)
yhat = model_fit.forecast()[0]
predictions.append(yhat)
history.append(test[t])
# calculate out of sample error
error = mean_squared_error(test, predictions)
return error
# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
dataset = dataset.astype('float32')
best_score, best_cfg = float("inf"), None
for p in p_values:
for d in d_values:
for q in q_values:
order = (p,d,q)
try:
mse = evaluate_arima_model(dataset, order)
if mse < best_score:
best_score, best_cfg = mse, order
print('ARIMA%s MSE=%.3f' % (order,mse))
except:
continue
print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))
# evaluate parameters
p_values = [0, 1, 2, 4, 6, 8, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(bhp_dry_other.SumSalesValue.values, p_values, d_values, q_values)
剩下的就是实现模型并进行预测。我不太确定如何继续或我做错了什么。但是当我尝试实现它时,这是一个错误:
mod = sm.tsa.statespace.SARIMAX(bhp_dry_other.SumSalesValue, trend='n', order=(2,1,0), seasonal_order=(0,1,1,12))
results = mod.fit()
我收到以下错误:
ValueError: maxlag should be < nobs
有什么建议吗?