如何在Python回归模型中合并和预测滞后的时间序列变量

时间:2019-10-07 22:31:30

标签: python scikit-learn statistics statsmodels

我试图找出如何将滞后因变量纳入statsmodel或scikitlearn中以预测带有AR项的时间序列,但似乎找不到解决方法。

一般线性方程如下:

y = B1 * y(t-1)+ B2 * x1(t)+ B3 * x2(t-3)+ e

我知道我可以使用pd.Series.shift(t)创建滞后变量,然后将其添加到模型中并生成参数,但是当代码不知道哪个变量是变量时,如何获得预测滞后因变量?

在SAS的Proc Autoreg中,您可以指定哪个变量是滞后因变量,并将进行相应的预测,但是似乎没有像Python这样的选项。

任何帮助将不胜感激,并在此先感谢您。

1 个答案:

答案 0 :(得分:0)

由于您已在标签中提到statsmodels,因此您可能想看看statsmodels - ARIMA,即:

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(endog=t, order=(2, 0, 0))  # p=2, d=0, q=0 for AR(2)
fit = model.fit()
fit.summary()

但是就像您提到的那样,您可以按照描述的方式手动创建新变量(我使用了一些随机数据):

import numpy as np
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'])
df['random_variable'] = np.random.randint(0, 10, len(df))
df['y'] = np.random.rand(len(df))
df.index = df['date']
df = df[['y', 'value', 'random_variable']]
df.columns = ['y', 'x1', 'x2']

shifts = 3

for variable in df.columns.values:
    for t in range(1, shifts + 1):
        df[f'{variable} AR({t})'] = df.shift(t)[variable]

df = df.dropna()
>>> df.head()
                   y        x1  x2    ...     x2 AR(1)  x2 AR(2)  x2 AR(3)
date                                  ...                                 
1991-10-01  0.715115  3.611003   7    ...          5.0       7.0       7.0
1991-11-01  0.202662  3.565869   3    ...          7.0       5.0       7.0
1991-12-01  0.121624  4.306371   7    ...          3.0       7.0       5.0
1992-01-01  0.043412  5.088335   6    ...          7.0       3.0       7.0
1992-02-01  0.853334  2.814520   2    ...          6.0       7.0       3.0
[5 rows x 12 columns]

我正在使用您在帖子中描述的模型:

model = sm.OLS(df['y'], df[['y AR(1)', 'x1', 'x2 AR(3)']])
fit = model.fit()
>>> fit.summary()
<class 'statsmodels.iolib.summary.Summary'>
"""
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.696
Model:                            OLS   Adj. R-squared:                  0.691
Method:                 Least Squares   F-statistic:                     150.8
Date:                Tue, 08 Oct 2019   Prob (F-statistic):           6.93e-51
Time:                        17:51:20   Log-Likelihood:                -53.357
No. Observations:                 201   AIC:                             112.7
Df Residuals:                     198   BIC:                             122.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
y AR(1)        0.2972      0.072      4.142      0.000       0.156       0.439
x1             0.0211      0.003      6.261      0.000       0.014       0.028
x2 AR(3)       0.0161      0.007      2.264      0.025       0.002       0.030
==============================================================================
Omnibus:                        2.115   Durbin-Watson:                   2.277
Prob(Omnibus):                  0.347   Jarque-Bera (JB):                1.712
Skew:                           0.064   Prob(JB):                        0.425
Kurtosis:                       2.567   Cond. No.                         41.5
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
"""

希望这可以帮助您入门。