Question

我有一个包含股票价格和市场指数的数据框

我需要为每只股票计算滚动的beta。我没有设法使用rolling_apply，因为它似乎只接受系列作为输入。无论受Python pandas calculate rolling stock beta using rolling apply to groupby object in vectorized fashion启发，我都创建了自己的函数。

问题是它需要很长时间，我想知道是否有更高效/ pythonic的方法来做到这一点

让我们考虑一个minium工作示例：一个（50,5）数据框，其中包含4种股票和市场指数的价格。

In[143]: df
Out[143]: 
             stock1   stock2   stock3   stock4   market
2001-01-02   987.42   985.68   992.43 1,000.51   994.99
2001-01-03   981.66   985.73   980.05   994.46   992.10
2001-01-04 1,010.29 1,027.32 1,019.99   990.48 1,019.77
2001-01-05 1,032.80 1,032.26 1,018.85 1,000.33 1,031.99
2001-01-08 1,027.73 1,034.16 1,024.27 1,003.17 1,039.18
2001-01-09 1,023.39 1,031.27 1,018.03   990.56 1,035.55
2001-01-10 1,006.60 1,020.88   996.01   969.00 1,033.35
2001-01-11 1,000.20 1,026.98   987.04   965.93 1,020.67
2001-01-12   997.28 1,026.38   976.39   956.51 1,019.93
2001-01-15 1,011.23 1,029.35   980.54   970.12 1,024.18
2001-01-16 1,001.01 1,022.45   979.40   960.23 1,027.70
2001-01-17 1,032.92 1,049.84   998.17   962.49 1,037.06
2001-01-18 1,039.49 1,046.08   995.25   954.84 1,036.63
2001-01-19 1,060.11 1,032.13 1,005.34   938.28 1,043.32
2001-01-22 1,078.96 1,041.04 1,015.77   940.91 1,035.55
2001-01-23 1,079.30 1,049.06 1,023.83   944.38 1,034.46
2001-01-24 1,071.39 1,058.78 1,025.33   943.67 1,039.62
2001-01-25 1,082.28 1,058.41 1,031.82   944.42 1,053.31
2001-01-26 1,080.65 1,052.52 1,039.75   948.22 1,046.47
2001-01-29 1,067.71 1,059.38 1,030.55   954.92 1,042.02
2001-01-30 1,059.38 1,061.58 1,035.54   956.96 1,035.43
2001-01-31 1,063.28 1,055.93 1,032.00   965.19 1,043.19
2001-02-01 1,066.58 1,038.35 1,051.17   969.88 1,048.01
2001-02-02 1,055.02 1,032.25 1,061.07   970.45 1,049.49
2001-02-05 1,058.69 1,033.56 1,049.81   974.73 1,039.59
2001-02-06 1,077.04 1,042.35 1,053.67   969.53 1,049.52
2001-02-07 1,081.48 1,035.72 1,044.86   973.62 1,050.71
2001-02-08 1,094.43 1,041.44 1,052.92   969.60 1,048.95
2001-02-09 1,081.68 1,032.42 1,055.74   965.29 1,033.96
2001-02-12 1,085.23 1,037.33 1,057.50   968.85 1,038.46
2001-02-13 1,087.73 1,046.19 1,061.14   970.44 1,040.80
2001-02-14 1,106.38 1,039.32 1,061.63   972.74 1,036.39
2001-02-15 1,109.81 1,052.12 1,087.66   981.12 1,044.47
2001-02-16 1,097.76 1,033.98 1,083.21   974.16 1,040.55
2001-02-19 1,105.29 1,030.52 1,086.86   983.31 1,039.09
2001-02-20 1,120.59 1,019.48 1,092.07   981.56 1,039.52
2001-02-21 1,141.82 1,001.32 1,090.09   978.44 1,037.89
2001-02-22 1,133.23   997.16 1,090.30   974.64 1,037.17
2001-02-23 1,117.17   983.76 1,074.18   961.88 1,037.08
2001-02-26 1,110.89   980.26 1,072.09   974.47 1,035.48
2001-02-27 1,108.14   978.90 1,087.26   974.01 1,045.79
2001-02-28 1,112.05   979.13 1,080.28   977.08 1,043.96
2001-03-01 1,119.32   977.34 1,066.85   977.90 1,050.26
2001-03-02 1,114.89   968.49 1,081.41   977.97 1,050.48
2001-03-05 1,116.62   974.88 1,094.89   988.49 1,055.40
2001-03-06 1,129.31   983.38 1,093.19   983.73 1,054.96
2001-03-07 1,138.58   987.68 1,100.89   995.12 1,069.54
2001-03-08 1,148.35   992.66 1,105.43 1,002.95 1,062.42
2001-03-09 1,148.92   979.94 1,101.82   993.69 1,064.08
2001-03-12 1,133.62   957.47 1,094.78   990.28 1,060.41

以下是我正在使用的功能：

def rolling_betas(df, window, lag=None):
    roll_betas = pd.DataFrame(np.nan, index=df.index,columns=df.columns)
    for i in range(1, len(df)+1):
        sub_df = df.iloc[max(i-window, 0):i,:] 
        idx = sub_df.index[-1]
        if len(sub_df) >= window:
            roll_betas.ix[idx] = get_betas(sub_df)
    if lag:
        roll_betas = roll_betas.shift(periods = lag)
    return roll_betas

def get_betas(subset):
    ''' apply to a DataFrame with two columns
        first one being the prices of one asset
        and the last column being the market'''
    subset = pd.DataFrame(subset)
    return np.cov(subset.pct_change().dropna(),rowvar=False)[-1] / np.cov(subset.pct_change().dropna(),rowvar=False)[-1][-1]

以下是子集需要多长时间：

%time betas = rolling_betas(df, 3)
Wall time: 1.3 s

它可能看起来不长，但对于更大的数据帧，它可能需要几分钟而不是最佳

谢谢，

Python pandas以pythonic方式计算DataFrame上的滚动beta

0 个答案: