如何预测2014年的销售额

时间:2018-03-29 21:05:43

标签: machine-learning forecasting

我有按月总销售额的日期集(日期设置为索引):

            SumSalesValue
Date    
2011-01-01  7746056
2011-02-01  6585410
2011-03-01  7065595
2011-04-01  7322146
2011-05-01  8341621
2011-06-01  7707603
2011-07-01  8899279
2011-08-01  8209745
2011-09-01  7714118
2011-10-01  8957886
2011-11-01  8008410
2011-12-01  9697578
2012-01-01  9586926
2012-02-01  8264172
2012-03-01  8435335
2012-04-01  8209244
2012-05-01  9909858
2012-06-01  8428824
2012-07-01  9864037
2012-08-01  8952514
2012-09-01  9030655
2012-10-01  10579182
2012-11-01  9706230
2012-12-01  9939929
2013-01-01  11493645
2013-02-01  10369875
2013-03-01  10760833
2013-04-01  10408647
2013-05-01  12220684
2013-06-01  11059714
2013-07-01  12903194
2013-08-01  11368418
2013-09-01  11231536
2013-10-01  12682956
2013-11-01  11331284
2013-12-01  11410860

我使用季节性ARIMA算法来创建2014年的预测。我正在通过Python完成这个项目。我已经有了使数据静止的步骤,并且我已经解决了最佳参数p,d和q:

def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.66)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float32')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except:
                    continue
    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# evaluate parameters
p_values = [0, 1, 2, 4, 6, 8, 10]
d_values = range(0, 3)
q_values = range(0, 3)
warnings.filterwarnings("ignore")
evaluate_models(bhp_dry_other.SumSalesValue.values, p_values, d_values, q_values)

剩下的就是实现模型并进行预测。我不太确定如何继续或我做错了什么。但是当我尝试实现它时,这是一个错误:

mod = sm.tsa.statespace.SARIMAX(bhp_dry_other.SumSalesValue, trend='n', order=(2,1,0), seasonal_order=(0,1,1,12))
results = mod.fit()

我收到以下错误:

ValueError: maxlag should be < nobs

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

你检查了这个issue吗?由于您没有大型数据集,因此可能需要将拟合函数添加到拟合函数中。