Question

我正在尝试计算两个时间序列之间的相关性。我尝试了以下代码

time1 = np.arange(0,1000,1).reshape((-1,1))
slope1 = 15
slope2 = 3
amp=1000

line1 = time1*slope1+amp
line2=time1*(0.5)+amp/10

corr=np.corrcoef(x=line1,y=line2,rowvar = False)

输出为

corr = [[1. 1.][1. 1.]]

我曾预计，由于两条线的斜率不同，相关性将远小于1。为什么相关性显示为1？

Answer 1

尽管斜率非常不同，但您可以将相关性视为忽略比例并寻找行进方向的事物。当您的一个变量的数量增加<plugin> <groupId>ru.trylogic.maven.plugins</groupId> <artifactId>redis-maven-plugin</artifactId> <version>1.4.6</version> <configuration> <forked>true</forked> </configuration> <executions> <execution> <id>launch-redis</id> <phase>pre-integration-test</phase> <goals> <goal>run</goal> </goals> </execution> <execution> <id>stop-redis</id> <phase>post-integration-test</phase> <goals> <goal>shutdown</goal> </goals> </execution> </executions> </plugin>时，另一个变量的数量增加<plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>exec-maven-plugin</artifactId> <version>1.4.0</version> <executions> <execution> <id>launch-redis</id> <phase>pre-integration-test</phase> <goals> <goal>exec</goal> </goals> <configuration> <executable>redis-server</executable> <arguments> <argument>${project.basedir}/src/test/redis/redis.conf</argument> <argument>--port</argument> <argument>${redisPort}</argument> </arguments> </configuration> </execution> <execution> <id>shutdown-redis</id> <phase>post-integration-test</phase> <goals> <goal>exec</goal> </goals> <configuration> <executable>redis-cli</executable> <arguments> <argument>-p</argument> <argument>${redisPort}</argument> <argument>shutdown</argument> </arguments> </configuration> </execution>，其中x1是一个常数，因此它们具有完美的相关性（它们总是相对于一个具有相同的行为另一个）。

Answer 2

如果您要像Excel的R ^ 2中那样表示相关性，则可以使用类似的东西（已经用于我的工作了）：

def correlation(Measure, Fit):
    """Calculates the correlation coefficient R^2 between the two sets
       of Y data provided. Logically, in order for the result to have a sense
       you want both Y arrays to have been created from the same X array."""

    Mean = np.mean(Measure)
    s1 = 0
    s2 = 0
    Size = np.size(Measure) # identical to np.size(Fit)

    for i in range(0, Size):
        s1 += (Measure[i] - Fit[i]) ** 2
        s2 += (Measure[i] - Mean) ** 2
    Rsquare = 1 - s1/s2
    return Rsquare

为了便于阅读，我删除了它们，但是您可以用各种预防措施和错误消息来包围它们，例如，当两个数组的大小不同或包含NAN时。

编辑：使用的公式来自Wikipedia上的测定系数文章。

如何解释这个numpy corrcoef输出

2 个答案: