Question

我有以下数据框结构：

&gt;

我需要做的是计算超过x天的行业之间的相关性。例如，我需要了解roc_sector + mean在过去3天内健康与采矿业之间的相关性。

我一直在用pandas df.corr（）和pd.rolling_corr（）尝试一些事情，但我没有取得任何成功，因为我似乎无法改变数据框结构目前的情况（如上所述），使我能够在x天内获得每个行业所需的相关性。

Answer 1

您可以通过执行适当的unstack，然后执行常规rolling_corr来执行此操作。

首先将industry设置为索引（或索引的一部分）。 unstack使用上述链接作为示例的适当索引级别。在结果数据框中，只需在行业列上使用rolling_corr即可。

Answer 2

这是你期望做的吗？假设这个df是你的数据帧 -

In [43]: df
Out[43]: 
         date industry  mean  max  min  count
0  2015-03-15   Health   123  675   12      6
1  2015-03-15   Mining   456  687   11      9
2  2015-03-16   Health   346  547   34      8
3  2015-03-16   Mining   234  879   34      2
4  2015-03-17   Health   345  875   54      6
5  2015-03-17   Mining   876  987   23      7

In [44]: x = df.pivot(index='date', columns='industry', values='mean')

In [45]: x
Out[45]: 
industry    Health  Mining
date                      
2015-03-15     123     456
2015-03-16     346     234
2015-03-17     345     876

In [46]: x.corr()
Out[46]: 
industry    Health    Mining
industry                    
Health    1.000000  0.171471
Mining    0.171471  1.000000

熊猫相关性

2 个答案: