您好我正在尝试计算大熊猫中扩展窗口的回归测试版。我有以下函数来计算beta
def beta(row, col1, col2):
return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
我已尝试以下方法在我的数据框df
pandas.expanding_apply(df, beta, col1='col1', col2='col2')
pandas.expanding_apply(df, beta, kwargs={'col1':'col1', 'col2':'col2'})
df.expanding.apply(...)
然而,没有一个工作,我要么得到的东西说kwargs没有通过,或者如果我在beta
函数中硬编码列名称
*** IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
由于
示例:
def beta(row, col1, col2):
return numpy.cov(row[col1],row[col2]) / numpy.var(row[col1])
df = pandas.DataFrame({'a':[1,2,3,4,5],'b':[.1,5,.3,.5,6]})
pandas.expanding_apply(compute_df, beta, col1='a', col2='b')
pandas.expanding_apply(compute_df, beta, kwargs={'col1':'a', 'col2':'b'})
这两个都返回错误
答案 0 :(得分:1)
我在尝试计算滚动多元回归的beta时遇到了这个问题,与你正在做的非常相似(见here)。关键问题在于Expanding.apply(func, args=(), kwargs={})
,func
param
必须从ndarray输入* args和** kwargs生成单个值 被传递给函数
[source]
实际上没有办法使用expanding.apply
。 (注意:如上所述,不推荐使用expanding_apply
。)
以下是解决方法。它的计算成本更高(会耗尽内存),但会让你输出。它会创建一个扩展窗口NumPy数组列表,然后计算每个数组的beta值。
from pandas_datareader.data import DataReader as dr
import numpy as np
import pandas as pd
df = (dr(['GOOG', 'SPY'], 'google')['Close']
.pct_change()
.dropna())
# i is the asset, m is market/index
# [0, 1] grabs cov_i,j from the covar. matrix
def beta(i, m):
return np.cov(i, m)[0, 1] / np.var(m)
def expwins(x, min_periods):
return [x[:i] for i in range(min_periods, x.shape[0] + 1)]
# Example:
# arr = np.arange(10).reshape(5, 2)
# print(expwins(arr, min_periods=3)[1]) # the 2nd window of the set
# array([[0, 1],
# [2, 3],
# [4, 5],
# [6, 7]])
min_periods = 21
# Create "blocks" of expanding windows
wins = expwins(df.values, min_periods=min_periods)
# Calculate a beta (single scalar val.) for each
betas = [beta(win[:, 0], win[:, 1]) for win in wins]
betas = pd.Series(betas, index=df.index[min_periods - 1:])
print(betas)
Date
2010-02-03 0.77572
2010-02-04 0.74769
2010-02-05 0.76692
2010-02-08 0.74301
2010-02-09 0.74741
2010-02-10 0.74635
2010-02-11 0.74735
2010-02-12 0.74605
2010-02-16 0.78521
2010-02-17 0.77619
2010-02-18 0.79188
2010-02-19 0.78952
2017-06-19 0.97387
2017-06-20 0.97390
2017-06-21 0.97386
2017-06-22 0.97387
2017-06-23 0.97391
2017-06-26 0.97389
2017-06-27 0.97482
2017-06-28 0.97508
2017-06-29 0.97594
2017-06-30 0.97584
2017-07-03 0.97575
2017-07-05 0.97588
dtype: float64