在Python中创建时间序列时出现非正半确定性错误

时间:2015-10-28 03:18:23

标签: python pandas statistics statsmodels

当我尝试通过statsmodels在Python中使用VAR时间序列模型拟合一些数据时收到错误,其文档可用here

我所拥有的数据位于数据框df_IBM_training中,如下所示:

          date  sym    open      high       low     close  newscount  
6   2014.08.05  IBM  189.30  189.3000  186.4100  187.0800          4   
9   2014.08.06  IBM  185.80  186.8800  184.4400  185.9000          0   
12  2014.08.07  IBM  186.56  186.8800    1.0000  184.2800          2   
15  2014.08.08  IBM  183.32  186.6800  183.3200  186.5499         18 

我想构建的模型VAR看起来像这样,我尝试在下面的代码中创建的回归量。我还尝试在下面的代码中搜索理想的模型顺序,这是我得到错误的地方。下面等式中的每个系数,例子包括α1,1,γ1,11与代码中的regressor相关联:

Δlog(C_t) =  α1,1(log(C_t - 1) − log(O_t-1))
           + α1,2(log‌​(C_t - 1) − log(H_t-1))
           + α1,3(log(C_t - 1) − log(L_t-1))
           + γ1,11Δlog(C_t − 1)
           + γ1,12Δlog(O_t − 1)
           + γ1,13Δlog(H_t − 1)
           + γ1,14Δlog(L_t − 1)
           + εt

我的代码如下。出于某种原因,我在行model.select_order(8)中收到以下错误:

  

numpy.linalg.linalg.linalgerror第7个领先的未成年人半正定

#VAR regressors
df_IBM_training['log_ret0'] = np.log(df_IBM_training.close) - np.log(df_IBM_training.close.shift(1)) 
df_IBM_training['log_ret1'] = np.log(df_IBM_training.open) - np.log(df_IBM_training.open.shift(1)) 
df_IBM_training['log_ret2'] = np.log(df_IBM_training.high) - np.log(df_IBM_training.high.shift(1)) 
df_IBM_training['log_ret3'] = np.log(df_IBM_training.low) - np.log(df_IBM_training.low.shift(1)) 
df_IBM_training = df_IBM_training[np.isfinite(df_IBM_training['log_ret3'])]

regressor_1 = np.log(df_IBM_training['close']) - np.log(df_IBM_training['open'])
regressor_2 = np.log(df_IBM_training['close']) - np.log(df_IBM_training['high'])
regressor_3 = np.log(df_IBM_training['close']) - np.log(df_IBM_training['low'])
regressor_4 = df_IBM_training['log_ret0']
regressor_5 = df_IBM_training['log_ret1']
regressor_6 = df_IBM_training['log_ret2']
regressor_7 = df_IBM_training['log_ret3']

X_IBM = [regressor_1, regressor_2, regressor_3, regressor_4,regressor_5, regressor_6, regressor_7]
X_IBM = np.array(X_IBM)
X_IBM = X_IBM.T

model = statsmodels.tsa.api.VAR(X_IBM)

#The line below is where the error arises
model.select_order(8)

修改:以下跟踪错误:

Traceback (most recent call last):
  File "TimeSeries.py", line 70, in <module>
    model.select_order(8)
  File "C:\Python34\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 505, in select_order
    for k, v in iteritems(result.info_criteria):
  File "C:\Python34\lib\site-packages\statsmodels\base\wrapper.py", line 35, in __getattribute__
    obj = getattr(results, attr)
  File "C:\Python34\lib\site-packages\statsmodels\tools\decorators.py", line 94, in __get__
    _cachedval = self.fget(obj)
  File "C:\Python34\lib\site-packages\statsmodels\tsa\vector_ar\var_model.py", line 1468, in info_criteria
    ld = logdet_symm(self.sigma_u_mle)
  File "C:\Python34\lib\site-packages\statsmodels\tools\linalg.py", line 213, in logdet_symm
    c, _ = linalg.cho_factor(m, lower=True)
  File "C:\Python34\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 132, in cho_factor
    check_finite=check_finite)
  File "C:\Python34\lib\site-packages\scipy\linalg\decomp_cholesky.py", line 30, in _cholesky
    raise LinAlgError("%d-th leading minor not positive definite" % info)
numpy.linalg.linalg.LinAlgError: 5-th leading minor not positive definite

0 个答案:

没有答案