我有一个包含股票价格和市场指数的数据框
我需要为每只股票计算滚动的beta。
我没有设法使用rolling_apply
,因为它似乎只接受系列作为输入。无论受Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion启发,我都创建了自己的函数。
问题是它需要很长时间,我想知道是否有更高效/ pythonic的方法来做到这一点
让我们考虑一个minium工作示例:一个(50,5)数据框,其中包含4种股票和市场指数的价格。
In[143]: df
Out[143]:
stock1 stock2 stock3 stock4 market
2001-01-02 987.42 985.68 992.43 1,000.51 994.99
2001-01-03 981.66 985.73 980.05 994.46 992.10
2001-01-04 1,010.29 1,027.32 1,019.99 990.48 1,019.77
2001-01-05 1,032.80 1,032.26 1,018.85 1,000.33 1,031.99
2001-01-08 1,027.73 1,034.16 1,024.27 1,003.17 1,039.18
2001-01-09 1,023.39 1,031.27 1,018.03 990.56 1,035.55
2001-01-10 1,006.60 1,020.88 996.01 969.00 1,033.35
2001-01-11 1,000.20 1,026.98 987.04 965.93 1,020.67
2001-01-12 997.28 1,026.38 976.39 956.51 1,019.93
2001-01-15 1,011.23 1,029.35 980.54 970.12 1,024.18
2001-01-16 1,001.01 1,022.45 979.40 960.23 1,027.70
2001-01-17 1,032.92 1,049.84 998.17 962.49 1,037.06
2001-01-18 1,039.49 1,046.08 995.25 954.84 1,036.63
2001-01-19 1,060.11 1,032.13 1,005.34 938.28 1,043.32
2001-01-22 1,078.96 1,041.04 1,015.77 940.91 1,035.55
2001-01-23 1,079.30 1,049.06 1,023.83 944.38 1,034.46
2001-01-24 1,071.39 1,058.78 1,025.33 943.67 1,039.62
2001-01-25 1,082.28 1,058.41 1,031.82 944.42 1,053.31
2001-01-26 1,080.65 1,052.52 1,039.75 948.22 1,046.47
2001-01-29 1,067.71 1,059.38 1,030.55 954.92 1,042.02
2001-01-30 1,059.38 1,061.58 1,035.54 956.96 1,035.43
2001-01-31 1,063.28 1,055.93 1,032.00 965.19 1,043.19
2001-02-01 1,066.58 1,038.35 1,051.17 969.88 1,048.01
2001-02-02 1,055.02 1,032.25 1,061.07 970.45 1,049.49
2001-02-05 1,058.69 1,033.56 1,049.81 974.73 1,039.59
2001-02-06 1,077.04 1,042.35 1,053.67 969.53 1,049.52
2001-02-07 1,081.48 1,035.72 1,044.86 973.62 1,050.71
2001-02-08 1,094.43 1,041.44 1,052.92 969.60 1,048.95
2001-02-09 1,081.68 1,032.42 1,055.74 965.29 1,033.96
2001-02-12 1,085.23 1,037.33 1,057.50 968.85 1,038.46
2001-02-13 1,087.73 1,046.19 1,061.14 970.44 1,040.80
2001-02-14 1,106.38 1,039.32 1,061.63 972.74 1,036.39
2001-02-15 1,109.81 1,052.12 1,087.66 981.12 1,044.47
2001-02-16 1,097.76 1,033.98 1,083.21 974.16 1,040.55
2001-02-19 1,105.29 1,030.52 1,086.86 983.31 1,039.09
2001-02-20 1,120.59 1,019.48 1,092.07 981.56 1,039.52
2001-02-21 1,141.82 1,001.32 1,090.09 978.44 1,037.89
2001-02-22 1,133.23 997.16 1,090.30 974.64 1,037.17
2001-02-23 1,117.17 983.76 1,074.18 961.88 1,037.08
2001-02-26 1,110.89 980.26 1,072.09 974.47 1,035.48
2001-02-27 1,108.14 978.90 1,087.26 974.01 1,045.79
2001-02-28 1,112.05 979.13 1,080.28 977.08 1,043.96
2001-03-01 1,119.32 977.34 1,066.85 977.90 1,050.26
2001-03-02 1,114.89 968.49 1,081.41 977.97 1,050.48
2001-03-05 1,116.62 974.88 1,094.89 988.49 1,055.40
2001-03-06 1,129.31 983.38 1,093.19 983.73 1,054.96
2001-03-07 1,138.58 987.68 1,100.89 995.12 1,069.54
2001-03-08 1,148.35 992.66 1,105.43 1,002.95 1,062.42
2001-03-09 1,148.92 979.94 1,101.82 993.69 1,064.08
2001-03-12 1,133.62 957.47 1,094.78 990.28 1,060.41
以下是我正在使用的功能:
def rolling_betas(df, window, lag=None):
roll_betas = pd.DataFrame(np.nan, index=df.index,columns=df.columns)
for i in range(1, len(df)+1):
sub_df = df.iloc[max(i-window, 0):i,:]
idx = sub_df.index[-1]
if len(sub_df) >= window:
roll_betas.ix[idx] = get_betas(sub_df)
if lag:
roll_betas = roll_betas.shift(periods = lag)
return roll_betas
def get_betas(subset):
''' apply to a DataFrame with two columns
first one being the prices of one asset
and the last column being the market'''
subset = pd.DataFrame(subset)
return np.cov(subset.pct_change().dropna(),rowvar=False)[-1] / np.cov(subset.pct_change().dropna(),rowvar=False)[-1][-1]
以下是子集需要多长时间:
%time betas = rolling_betas(df, 3)
Wall time: 1.3 s
它可能看起来不长,但对于更大的数据帧,它可能需要几分钟而不是最佳
谢谢,