Question

我有两个数据框，每个数据框有200列。为了说明，我在这里只使用了3列。

Dataframe df1 as：

            A   B   C
1/4/2017    5   6   6
1/5/2017    5   2   1
1/6/2017    6   2   10
1/9/2017    1   9   10
1/10/2017   6   6   4
1/11/2017   6   1   1
1/12/2017   1   7   10
1/13/2017   8   9   6

Dataframe df2：

            A   D   B
1/4/2017    8   10  2
1/5/2017    2   1   8
1/6/2017    6   6   6
1/9/2017    1   8   1
1/10/2017   10  6   2
1/11/2017   10  2   4
1/12/2017   5   4   10
1/13/2017   5   2   8

我想为df1和df2的相应列计算以下相关矩阵：

            A       B
1/4/2017        
1/5/2017        
1/6/2017    0.19    -0.94
1/9/2017    0.79    -0.96
1/10/2017   0.90    -0.97
1/11/2017   1.00    -1.00
1/12/2017   1.00    0.42
1/13/2017   0.24    0.84

即。对df1和df2的相同列使用尾随的3天历史数据，我需要找到相关矩阵。

所以，我计算corr([5, 5, 6], [8, 2, 6]) = 0.19其中[5,5,6]来自df1['A']而[8,2,6]来自df2['A']

因为，每个我有200列，我发现运行两次for循环非常麻烦。首先循环遍历列，然后使用尾随3天滞后数据。

Answer 1

这是你需要的吗？

l=[]
id=df1.columns.intersection(df2.columns)
for x in id:
    l.append(pd.rolling_corr(df1[x],df2[x],window=3))# notice you should change it to `l.append(df1[x].rolling(3).corr(df2[x]))`

pd.concat(l,axis=1)


Out[13]: 
                  A         B
1/4/2017        NaN       NaN
1/5/2017        NaN       NaN
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628

Answer 2

选项1
我构建了一个生成器并将其包装在pd.concat

中

def rolling_corrwith(d1, d2, window):
    d1, d2 = d1.align(d2, 'inner')
    for i in range(len(d1) - window + 1):
        j = i + window
        yield d1.iloc[i:j].corrwith(d2.iloc[i:j]).rename(d1.index[j-1])

pd.concat(list(rolling_corrwith(df1, df2, 3)), axis=1).T

                  A         B
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628

选项2
使用numpy步幅。我不推荐这种方法。但对于那些感兴趣的人来说，值得一提。

from numpy.lib.stride_tricks import as_strided as strided

def sprp(v, w):
    s0, s1 = v.strides
    n, m = v.shape
    return strided(v, (n + 1 - w, w, m), (s0, s0, s1))

def rolling_corrwith2(d1, d2, window):
    d1, d2 = d1.align(d2, 'inner')

    s1 = sprp(d1.values, window)
    s2 = sprp(d2.values, window)

    m1 = s1.mean(1, keepdims=1)
    m2 = s2.mean(1, keepdims=1)
    z1 = s1.std(1)
    z2 = s2.std(1)

    c  = ((s1 - m1) * (s2 - m2)).sum(1) / z1 / z2 / window

    return pd.DataFrame(c, d1.index[window - 1:], d1.columns)

rolling_corrwith2(df1, df2, 3)

                  A         B
1/6/2017   0.188982 -0.944911
1/9/2017   0.785714 -0.960769
1/10/2017  0.896258 -0.968620
1/11/2017  1.000000 -0.998906
1/12/2017  1.000000  0.423415
1/13/2017  0.240192  0.838628

查找数据框

2 个答案: