auto.arima()等效于python

时间:2014-03-31 19:22:58

标签: python r time-series statsmodels forecasting

我正在尝试使用 ARMA ARIMA模型预测每周销售额。我找不到用于调整statsmodels中的顺序(p,d,q)的函数。目前R具有函数forecast::auto.arima(),其将调整(p,d,q)参数。

如何为我的模型选择正确的订单?为此目的,python中是否有可用的库?

8 个答案:

答案 0 :(得分:57)

您可以实施多种方法:

  1. ARIMAResults包括aicbic。根据他们的定义(参见herehere),这些标准会惩罚模型中的参数数量。因此,您可以使用这些数字来比较模型。 scipy还有optimize.brute,它在指定的参数空间上进行网格搜索。所以像这样的工作流应该有效:

    def objfunc(order, exog, endog):
        from statsmodels.tsa.arima_model import ARIMA
        fit = ARIMA(endog, order, exog).fit()
        return fit.aic()
    
    from scipy.optimize import brute
    grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
    brute(objfunc, grid, args=(exog, endog), finish=None)
    

    请务必使用brute致电finish=None

  2. 您可以从pvalues获取ARIMAResults。因此,一种步进算法很容易实现,其中模型的程度在整个维度上增加,从而获得所添加参数的最低p值。

  3. 使用ARIMAResults.predict交叉验证替代模型。最好的方法是保持时间序列的尾部(比如最近5%的数据),并使用这些点来获得拟合模型的测试误差

答案 1 :(得分:11)

答案 2 :(得分:2)

我编写了这些实用程序函数来直接计算pdq值 get_PDQ_parallel 需要三个输入数据,这些数据与时间戳(日期时间)串联作为索引。 n_jobs将提供多个并行处理器。输出将是具有aic和bic值的数据帧,其中order =(P,D,Q)在索引中 p和q范围是[0,12]而d是[0,1]

import statsmodels 
from statsmodels import api as sm
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.utils import check_array
from functools import partial
from multiprocessing import Pool
def get_aic_bic(order,series):
    aic=np.nan
    bic=np.nan
    #print(series.shape,order)
    try:
        arima_mod=statsmodels.tsa.arima_model.ARIMA(series,order=order,freq='H').fit(transparams=True,method='css')
        aic=arima_mod.aic
        bic=arima_mod.bic
        print(order,aic,bic)
    except:
        pass
    return aic,bic

def get_PDQ_parallel(data,n_jobs=7):
    p_val=13
    q_val=13
    d_vals=2
    pdq_vals=[ (p,d,q) for p in range(p_val) for d in range(d_vals) for q in range(q_val)]
    get_aic_bic_partial=partial(get_aic_bic,series=data)
    p = Pool(n_jobs)
    res=p.map(get_aic_bic_partial, pdq_vals)  
    p.close()
    return pd.DataFrame(res,index=pdq_vals,columns=['aic','bic']) 

答案 3 :(得分:2)

可能的解决方案

  int counter = 0;
  for(int i=0; i<any.length()-1; i++) {
        if(any.charAt(i) == '(') {
            counter++;
        } else if(any.charAt(i) == ')')) {
            counter--;
        }
        if (counter<0) {
          System.out.println("Close bracket with no open bracket found");
        }
    }

    if (counter > 0) {
        System.out.println("An open bracket was never closed");
    }

来自https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3

另见https://github.com/decisionstats/pythonfordatascience/blob/master/time%2Bseries%20(1).ipynb

答案 4 :(得分:2)

def evaluate_arima_model(X, arima_order):
    # prepare training dataset
    train_size = int(len(X) * 0.90)
    train, test = X[0:train_size], X[train_size:]
    history = [x for x in train]
    # make predictions
    predictions = list()
    for t in range(len(test)):
        model = ARIMA(history, order=arima_order)
        model_fit = model.fit(disp=0)
        yhat = model_fit.forecast()[0]
        predictions.append(yhat)
        history.append(test[t])
    # calculate out of sample error
    error = mean_squared_error(test, predictions)
    return error

# evaluate combinations of p, d and q values for an ARIMA model
def evaluate_models(dataset, p_values, d_values, q_values):
    dataset = dataset.astype('float32')
    best_score, best_cfg = float("inf"), None
    for p in p_values:
        for d in d_values:
            for q in q_values:
                order = (p,d,q)
                try:
                    mse = evaluate_arima_model(dataset, order)
                    if mse < best_score:
                        best_score, best_cfg = mse, order
                    print('ARIMA%s MSE=%.3f' % (order,mse))
                except:
                    continue
    print('Best ARIMA%s MSE=%.3f' % (best_cfg, best_score))

# load dataset
def parser(x):
    return datetime.strptime('190'+x, '%Y-%m')



import datetime
p_values = [4,5,6,7,8]
d_values = [0,1,2]
q_values = [2,3,4,5,6]
warnings.filterwarnings("ignore")
evaluate_models(train, p_values, d_values, q_values)

这将为您提供p,d,q值,然后使用ARIMA模型的值

答案 5 :(得分:1)

到目前为止,我们可以直接使用pypi中的pyramid-arima包

检查 https://pypi.org/project/pyramid-arima/

答案 6 :(得分:-1)

在conda中,使用conda install -c saravji pmdarima进行安装。

用户saravji已将其放入anaconda云中。

然后使用

from pmdarima.arima import auto_arima

(请注意,名称pyramid-arima更改为pmdarima)。

答案 7 :(得分:-3)

实际上

def objfunc(order,*params ):    
    from statsmodels.tsa.arima_model import ARIMA   
    p,d,q = order   
    fit = ARIMA(endog, order, exog).fit()  
    return fit.aic()    
from scipy.optimize import brute
grid = (slice(1, 3, 1), slice(1, 3, 1), slice(1, 3, 1))
brute(objfunc, grid, args=params, finish=None)