我列出了由PERMNO区分的10只股票。我想通过PERMNO对这些股票进行分组,并计算每个PERMNO的股票收益率(RET)与市场收益率(vwretd)之间的滚动相关性。我正在尝试的代码如下。
CRSP['rollingcorr'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['RET'],CRSP['vwretd'],10)
我得到的错误如下。
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-32-c18e1ce01302> in <module>()
1 #CRSP['rollingcorr'] = CRSP.rolling_corr(CRSP['vwretd'],CRSP['RET'],120)
----> 2 CRSP['rollingmean'] = CRSP.groupby('PERMNO').rolling_corr(CRSP['vwretd'],10)
3 CRSP.head(20)
C:\Users\rebortz\Anaconda\lib\site-packages\pandas\core\groupby.pyc in __getattr__(self, attr)
296
297 raise AttributeError("%r object has no attribute %r" %
--> 298 (type(self).__name__, attr))
299
300 def __getitem__(self, key):
AttributeError: 'DataFrameGroupBy' object has no attribute 'rolling_corr'
请帮忙!
由于
答案 0 :(得分:9)
在Python 3.5上运行rolling.corr()
会生成警告,该函数已弃用,并且可能在将来停止工作。建议使用Series.rolling(window=<period>).corr(other=series)
。
E.g。
data['scrip1DailyReturn'].rolling(window=90).corr(other=data['scrip2DailyReturn'])
答案 1 :(得分:3)
使用pandas.rolling_corr
,而不是DataFrame.rolling_corr
。此外,groupby
返回一个生成器。见下面的代码。
代码:
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
for key, value in df_gen:
print "key: {}".format(key)
print value.rolling_corr(value["Value1"],value["Value2"], 3)
输出:
key: Blue
1 NaN
3 NaN
6 0.931673
8 0.865066
10 0.089304
12 -0.998656
15 -0.971373
17 -0.667316
dtype: float64
key: Red
0 NaN
2 NaN
5 -0.911357
9 -0.152221
11 -0.971153
14 0.438697
18 -0.550727
dtype: float64
key: Yellow
4 NaN
7 NaN
13 -0.040330
16 0.879371
dtype: float64
您可以将循环部分更改为以下内容,以便使用新列查看原始数据帧后分组。
for key, value in df_gen:
value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
print value
输出:
Color Value1 Value2 ROLL_CORR
1 Blue 0.951227 0.514999 NaN
3 Blue 0.649112 0.513052 NaN
6 Blue 0.148165 0.342205 0.931673
8 Blue 0.626883 0.421530 0.865066
10 Blue 0.286738 0.583811 0.089304
12 Blue 0.966779 0.227340 -0.998656
15 Blue 0.065493 0.887640 -0.971373
17 Blue 0.757932 0.900103 -0.667316
key: Red
Color Value1 Value2 ROLL_CORR
0 Red 0.201435 0.981871 NaN
2 Red 0.522955 0.357239 NaN
5 Red 0.806326 0.310039 -0.911357
9 Red 0.656126 0.678047 -0.152221
11 Red 0.435898 0.908388 -0.971153
14 Red 0.116419 0.555821 0.438697
18 Red 0.793102 0.168033 -0.550727
key: Yellow
Color Value1 Value2 ROLL_CORR
4 Yellow 0.099474 0.143293 NaN
7 Yellow 0.073128 0.749297 NaN
13 Yellow 0.006777 0.318383 -0.040330
16 Yellow 0.345647 0.993382 0.879371
如果你想在处理之后将它们全部加在一起(顺便说一下,这可能会让其他人感到困惑),只需在处理组后使用concat
。
import pandas as pd
df = pd.read_csv("color.csv")
df_gen = df.copy().groupby("Color")
dfs = [] # Container for dataframes.
for key, value in df_gen:
value["ROLL_CORR"] = pd.rolling_corr(value["Value1"],value["Value2"], 3)
print value
dfs.append(value)
df_final = pd.concat(dfs)
print df_final
输出:
Color Value1 Value2 ROLL_CORR
1 Blue 0.951227 0.514999 NaN
3 Blue 0.649112 0.513052 NaN
6 Blue 0.148165 0.342205 0.931673
8 Blue 0.626883 0.421530 0.865066
10 Blue 0.286738 0.583811 0.089304
12 Blue 0.966779 0.227340 -0.998656
15 Blue 0.065493 0.887640 -0.971373
17 Blue 0.757932 0.900103 -0.667316
0 Red 0.201435 0.981871 NaN
2 Red 0.522955 0.357239 NaN
5 Red 0.806326 0.310039 -0.911357
9 Red 0.656126 0.678047 -0.152221
11 Red 0.435898 0.908388 -0.971153
14 Red 0.116419 0.555821 0.438697
18 Red 0.793102 0.168033 -0.550727
4 Yellow 0.099474 0.143293 NaN
7 Yellow 0.073128 0.749297 NaN
13 Yellow 0.006777 0.318383 -0.040330
16 Yellow 0.345647 0.993382 0.879371
希望这有帮助。
答案 2 :(得分:1)
我找到了一个有效的解决方案。相当简单。
def roll_corr_groupby(x,i):
x['Z'] = rolling_corr(x['col 1'], x['col 2'],i)
return x
x.groupby(['key']).apply(roll_corr_groupby)
x.head()