用熊猫计算滚动相关性

时间:2014-11-21 19:38:06

标签: python pandas

我列出了由PERMNO区分的10只股票。我想通过PERMNO对这些股票进行分组,并计算每个PERMNO的股票收益率(RET)与市场收益率(vwretd)之间的滚动相关性。我正在尝试的代码如下。

CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)

我得到的错误如下。

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
      1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
      3 CRSP.head(20)

C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
    296 
    297         raise AttributeError("%r object has no attribute %r" %
--> 298                              (type(self).__name__, attr))
    299 
    300     def __getitem__(self, key):

AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'

请帮忙!

由于

3 个答案:

答案 0 :(得分:9)

在Python 3.5上运行rolling.corr()会生成警告,该函数已弃用,并且可能在将来停止工作。建议使用Series.rolling(window=<period>).corr(other=series)。 E.g。

data['scrip1DailyReturn'].rolling(window=90).corr(other=data['scrip2DailyReturn'])

答案 1 :(得分:3)

使用pandas.rolling_corr,而不是DataFrame.rolling_corr。此外,groupby返回一个生成器。见下面的代码。

代码:

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

for key, value in df_gen:
    print "key: {}".format(key)
    print value.rolling_corr(value["Value1"],value["Value2"], 3)

输出:

key: Blue
1          NaN
3          NaN
6     0.931673
8     0.865066
10    0.089304
12   -0.998656
15   -0.971373
17   -0.667316
dtype: float64
key: Red
0          NaN
2          NaN
5    -0.911357
9    -0.152221
11   -0.971153
14    0.438697
18   -0.550727
dtype: float64
key: Yellow
4          NaN
7          NaN
13   -0.040330
16    0.879371
dtype: float64

您可以将循环部分更改为以下内容,以便使用新列查看原始数据帧后分组。

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value

输出:

   Color    Value1    Value2  ROLL_CORR
1   Blue  0.951227  0.514999        NaN
3   Blue  0.649112  0.513052        NaN
6   Blue  0.148165  0.342205   0.931673
8   Blue  0.626883  0.421530   0.865066
10  Blue  0.286738  0.583811   0.089304
12  Blue  0.966779  0.227340  -0.998656
15  Blue  0.065493  0.887640  -0.971373
17  Blue  0.757932  0.900103  -0.667316
key: Red
   Color    Value1    Value2  ROLL_CORR
0    Red  0.201435  0.981871        NaN
2    Red  0.522955  0.357239        NaN
5    Red  0.806326  0.310039  -0.911357
9    Red  0.656126  0.678047  -0.152221
11   Red  0.435898  0.908388  -0.971153
14   Red  0.116419  0.555821   0.438697
18   Red  0.793102  0.168033  -0.550727
key: Yellow
     Color    Value1    Value2  ROLL_CORR
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

如果你想在处理之后将它们全部加在一起(顺便说一下,这可能会让其他人感到困惑),只需在处理组后使用concat

import pandas as pd

df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")

dfs = [] # Container for dataframes.

for key, value in df_gen:
    value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
    print value
    dfs.append(value)

df_final = pd.concat(dfs)
print df_final

输出:

     Color    Value1    Value2  ROLL_CORR
1     Blue  0.951227  0.514999        NaN
3     Blue  0.649112  0.513052        NaN
6     Blue  0.148165  0.342205   0.931673
8     Blue  0.626883  0.421530   0.865066
10    Blue  0.286738  0.583811   0.089304
12    Blue  0.966779  0.227340  -0.998656
15    Blue  0.065493  0.887640  -0.971373
17    Blue  0.757932  0.900103  -0.667316
0      Red  0.201435  0.981871        NaN
2      Red  0.522955  0.357239        NaN
5      Red  0.806326  0.310039  -0.911357
9      Red  0.656126  0.678047  -0.152221
11     Red  0.435898  0.908388  -0.971153
14     Red  0.116419  0.555821   0.438697
18     Red  0.793102  0.168033  -0.550727
4   Yellow  0.099474  0.143293        NaN
7   Yellow  0.073128  0.749297        NaN
13  Yellow  0.006777  0.318383  -0.040330
16  Yellow  0.345647  0.993382   0.879371

希望这有帮助。

答案 2 :(得分:1)

我找到了一个有效的解决方案。相当简单。

def roll_corr_groupby(x,i):
    x['Z'] = rolling_corr(x['col 1'], x['col 2'],i) 
    return x

x.groupby(['key']).apply(roll_corr_groupby)
x.head()