大熊猫扩大申请回归测试版

时间:2017-07-06 12:18:51

标签: python pandas apply statsmodels

您好我正在尝试计算大熊猫中扩展窗口的回归测试版。我有以下函数来计算beta

  def beta(row, col1, col2):
      return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])

我已尝试以下方法在我的数据框df

上获得不断扩展的测试版
pandas.expanding_apply(df, beta, col1='col1', col2='col2')
pandas.expanding_apply(df, beta, kwargs={'col1':'col1', 'col2':'col2'})
df.expanding.apply(...)

然而,没有一个工作,我要么得到的东西说kwargs没有通过,或者如果我在beta函数中硬编码列名称

*** IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

由于

示例:

def beta(row, col1, col2):
    return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
df = pandas.DataFrame({'a':[1,2,3,4,5],'b':[.1,5,.3,.5,6]})
pandas.expanding_apply(compute_df, beta, col1='a', col2='b')
pandas.expanding_apply(compute_df, beta, kwargs={'col1':'a', 'col2':'b'})

这两个都返回错误

1 个答案:

答案 0 :(得分:1)

我在尝试计算滚动多元回归的beta时遇到了这个问题,与你正在做的非常相似(见here)。关键问题在于Expanding.apply(func, args=(), kwargs={})func param

  

必须从ndarray输入* args和** kwargs生成单个值   被传递给函数

[source]

实际上没有办法使用expanding.apply。 (注意:如上所述,不推荐使用expanding_apply。)

以下是解决方法。它的计算成本更高(会耗尽内存),但会让你输出。它会创建一个扩展窗口NumPy数组列表,然后计算每个数组的beta值。

from pandas_datareader.data import DataReader as dr
import numpy as np
import pandas as pd

df = (dr(['GOOG', 'SPY'], 'google')['Close']
      .pct_change()
      .dropna())

# i is the asset, m is market/index
# [0, 1] grabs cov_i,j from the covar. matrix
def beta(i, m):
    return np.cov(i, m)[0, 1] / np.var(m)

def expwins(x, min_periods):
    return [x[:i] for i in range(min_periods, x.shape[0] + 1)]

# Example:
# arr = np.arange(10).reshape(5, 2)
# print(expwins(arr, min_periods=3)[1]) # the 2nd window of the set
# array([[0, 1],
       # [2, 3],
       # [4, 5],
       # [6, 7]])

min_periods = 21
# Create "blocks" of expanding windows
wins = expwins(df.values, min_periods=min_periods)
# Calculate a beta (single scalar val.) for each
betas = [beta(win[:, 0], win[:, 1]) for win in wins]
betas = pd.Series(betas, index=df.index[min_periods - 1:])

print(betas)
Date
2010-02-03    0.77572
2010-02-04    0.74769
2010-02-05    0.76692
2010-02-08    0.74301
2010-02-09    0.74741
2010-02-10    0.74635
2010-02-11    0.74735
2010-02-12    0.74605
2010-02-16    0.78521
2010-02-17    0.77619
2010-02-18    0.79188
2010-02-19    0.78952

2017-06-19    0.97387
2017-06-20    0.97390
2017-06-21    0.97386
2017-06-22    0.97387
2017-06-23    0.97391
2017-06-26    0.97389
2017-06-27    0.97482
2017-06-28    0.97508
2017-06-29    0.97594
2017-06-30    0.97584
2017-07-03    0.97575
2017-07-05    0.97588
dtype: float64