我有以下用于执行滚动OLS计算的示例数据(这里我是从调试器中执行的):
(Pdb) rhs
['Yield']
(Pdb) lhs
'Returns'
(Pdb) min_periods
12
(Pdb) window
60
(Pdb) intercept
True
(Pdb) print df[rhs].to_string()
Yield
EndOfMonthDate
2001-08-31 0.0561
2001-09-28 0.0360
2001-10-31 0.0500
2001-11-30 0.0500
2001-12-31 0.0500
2002-01-31 0.0191
2002-02-28 0.0563
2002-03-29 0.0557
2002-04-30 0.0600
2002-05-31 0.0569
2002-06-28 0.0571
2002-07-31 0.0450
2002-08-30 0.0416
2002-09-30 0.0360
2002-10-31 0.0395
2002-11-29 0.0422
2010-05-31 0.0323
2010-06-30 0.0311
2010-07-30 0.0300
2010-07-30 0.0300
2010-08-31 0.0251
2010-08-31 0.0251
2010-09-30 0.0250
2010-10-29 0.0271
2010-11-30 0.0287
2010-12-31 0.0347
2010-12-31 0.0347
2012-01-31 0.0201
2012-02-29 0.0197
2012-03-30 0.0220
2012-04-30 0.0199
2012-07-31 0.0141
(Pdb) print df[lhs].to_string()
2001-08-31 -0.005519
2001-09-28 -0.350356
2001-10-31 10.003698
2001-11-30 3.230476
2001-12-31 -3.776050
2002-01-31 9.153807
2002-02-28 -4.175085
2002-03-29 46.890701
2002-04-30 -15.747041
2002-05-31 2.797472
2002-06-28 -1.000851
2002-07-31 -13.398200
2002-08-30 -1.707745
2002-09-30 2.054250
2002-10-31 0.000620
2002-11-29 -9.790426
2010-05-31 0.000012
2010-06-30 0.000012
2010-07-30 -1.745182
2010-07-30 -0.000006
2010-08-31 -20.779633
2010-08-31 0.000000
2010-09-30 -0.000006
2010-10-29 -0.000012
2010-11-30 -0.000006
2010-12-31 30.165554
2010-12-31 -2.549851
2012-01-31 -6.892008
2012-02-29 -1.638216
2012-03-30 4.295588
2012-04-30 -7.094216
2012-07-31 -0.041252
当我尝试滚动OLS时:
(Pdb) pandas.ols(y=df[lhs], x=df[rhs], window=window, min_periods=min_periods, intercept=intercept)
*** TypeError: unsupported operand type(s) for +: 'slice' and 'int'
但如果只是为整个数据范围尝试常规OLS,那么看起来很好:
(Pdb) pandas.ols(y=df[lhs], x=df[rhs], intercept=intercept)
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <Yield> + <intercept>
Number of Observations: 38
Number of Degrees of Freedom: 2
R-squared: 0.0226
Adj R-squared: -0.0046
Rmse: 12.5182
F-stat (1, 36): 0.8321, p-value: 0.3677
Degrees of Freedom: model 1, resid 36
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
Yield 146.6702 160.7874 0.91 0.3677 -168.4732 461.8135
intercept -4.6083 6.0652 -0.76 0.4523 -16.4961 7.2795
---------------------------------End of Summary---------------------------------
在尝试滚动回归的情况下,这是pandas.ols
的已知错误吗?数据量很小,显然没有任何缺陷可以防止在这种情况下滚动12到60的观察回归。
不查看调试器时得到的完整回溯:
File "properties.pyx", line 31, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:28841)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 656, in beta
return DataFrame(self._beta_raw,
File "properties.pyx", line 31, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:28841)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 775, in _beta_raw
beta, indices, mask = self._rolling_ols_call
File "properties.pyx", line 31, in pandas.lib.cache_readonly.__get__ (pandas/lib.c:28841)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 789, in _rolling_ols_call
return self._calc_betas(self._x_trans, self._y_trans)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 803, in _calc_betas
cum_xx = self._cum_xx(x)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 865, in _cum_xx
x_slice = slicer(x, date)
File "/opt/epd/7.3-2_pandas0.12/lib/python2.7/site-packages/pandas/stats/ols.py", line 856, in slicer
return df.values[i:i + 1, :]
TypeError: unsupported operand type(s) for +: 'slice' and 'int'
加
违规代码似乎属于Pandas 0.12中ols.py
的此功能。
def _cum_xx(self, x):
dates = self._index
K = len(x.columns)
valid = self._time_has_obs
cum_xx = []
slicer = lambda df, dt: df.truncate(dt, dt).values
if not self._panel_model:
_get_index = x.index.get_loc
def slicer(df, dt):
i = _get_index(dt)
return df.values[i:i + 1, :]
last = np.zeros((K, K))
for i, date in enumerate(dates):
if not valid[i]:
cum_xx.append(last)
continue
x_slice = slicer(x, date)
xx = last = last + np.dot(x_slice.T, x_slice)
cum_xx.append(xx)
return cum_xx
_get_index
是x.index.get_loc
的代理,表示它可以返回切片对象。但是下面的代码假设以这种方式获得的值i
是一个整数,因此i+1
是有意义的。
我找到了get_loc
的来源。事实证明,x.index.get_loc
是x.index._engine.get_loc
的代理。在我的情况下,错误发生时相关_engine_type
的{{1}}仅为index
,defined in this source location并且ObjectEngine
定义在那里:
get_loc
我正在调查何时/为什么cpdef get_loc(self, object val):
if is_definitely_invalid_key(val):
raise TypeError
if self.over_size_threshold and self.is_monotonic:
if not self.is_unique:
return self._get_loc_duplicates(val)
values = self._get_index_values()
loc = _bin_search(values, val) # .searchsorted(val, side='left')
if util.get_value_at(values, loc) != val:
raise KeyError(val)
return loc
self._ensure_mapping_populated()
if not self.unique:
return self._get_loc_duplicates(val)
self._check_type(val)
try:
return self.mapping.get_item(val)
except TypeError:
raise KeyError(val)
为我返回一个切片(在索引中肯定没有重复,这是文档建议的唯一方法)。与此同时,这些方面的任何建议都会有所帮助。