Question

我正在学习numpy / scipy，来自MATLAB背景。 xcorr function in Matlab有一个可选参数“maxlag”，它将滞后范围限制在-maxlag到maxlag之间。如果您正在查看两个非常长的时间序列之间的互相关，但仅对某个时间范围内的相关性感兴趣，这非常有用。考虑到互相关的计算成本非常高，性能的提升是巨大的。

在numpy / scipy中，似乎有几种计算互相关的选项。 numpy.correlate，numpy.convolve，scipy.signal.fftconvolve。如果有人想解释它们之间的区别，我会很高兴听到，但主要是令我不安的是它们都没有maxlag功能。这意味着即使我只想看到两个时间序列之间的相关性，例如在-100和+100毫秒之间，它仍将计算-20000和+20000毫秒之间的每个滞后的相关性（这是时间序列）。这样可以获得200倍的性能！我是否必须手动重新编码互相关函数以包含此功能？

Answer 1

以下是一些用于计算有限滞后的自相关和互相关的函数。选择乘法（和复杂情况下的共轭）的顺序以匹配numpy.correlate的相应行为。

import numpy as np
from numpy.lib.stride_tricks import as_strided


def _check_arg(x, xname):
    x = np.asarray(x)
    if x.ndim != 1:
        raise ValueError('%s must be one-dimensional.' % xname)
    return x

def autocorrelation(x, maxlag):
    """
    Autocorrelation with a maximum number of lags.

    `x` must be a one-dimensional numpy array.

    This computes the same result as
        numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]

    The return value has length maxlag + 1.
    """
    x = _check_arg(x, 'x')
    p = np.pad(x.conj(), maxlag, mode='constant')
    T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
                   strides=(-p.strides[0], p.strides[0]))
    return T.dot(p[maxlag:].conj())


def crosscorrelation(x, y, maxlag):
    """
    Cross correlation with a maximum number of lags.

    `x` and `y` must be one-dimensional numpy arrays with the same length.

    This computes the same result as
        numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]

    The return vaue has length 2*maxlag + 1.
    """
    x = _check_arg(x, 'x')
    y = _check_arg(y, 'y')
    py = np.pad(y.conj(), 2*maxlag, mode='constant')
    T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
                   strides=(-py.strides[0], py.strides[0]))
    px = np.pad(x, maxlag, mode='constant')
    return T.dot(px)

例如，

In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])

In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [370]: y = np.arange(8)

In [371]: crosscorrelation(x, y, 3)
Out[371]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

（在numpy中拥有这样的功能会很棒。）

Answer 2

matplotlib.pyplot提供类似matlab的语法，用于计算和绘制互相关，自相关等。

您可以使用xcorr来定义maxlags参数。

    import matplotlib.pyplot as plt


    import numpy  as np


    data = np.arange(0,2*np.pi,0.01)


    y1 = np.sin(data)


    y2 = np.cos(data)


    coeff = plt.xcorr(y1,y2,maxlags=10)

    print(*coeff)


[-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
   8   9  10] [ -9.81991753e-02  -8.85505028e-02  -7.88613080e-02  -6.91325329e-02
  -5.93651264e-02  -4.95600447e-02  -3.97182508e-02  -2.98407146e-02
  -1.99284126e-02  -9.98232812e-03  -3.45104289e-06   9.98555430e-03
   1.99417667e-02   2.98641953e-02   3.97518558e-02   4.96037706e-02
   5.94189688e-02   6.91964864e-02   7.89353663e-02   8.86346584e-02
   9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)

Answer 3

在numpy实现maxlag参数之前，您可以使用pycorrelate package中的函数ucorrelate。 ucorrelate在numpy数组上运行，并且有maxlag个关键字。它通过使用for循环实现相关性，并使用numba优化执行速度。

示例 - 3次滞后的自相关：

import numpy as np
import pycorrelate as pyc

x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c

结果：

Out[1]: array([20,  5, -3])

pycorrelate文档包含a notebook，显示pycorrelate.ucorrelate和numpy.correlate之间的完美匹配：

Answer 4

前段时间我遇到了同样的问题，我更加注重计算的效率。参考MATLAB函数xcorr.m的源代码，我做了一个简单的例子。

import numpy as np
from scipy import signal, fftpack
import math
import time

def nextpow2(x):
    if x == 0:
        y = 0
    else:
        y = math.ceil(math.log2(x))
    return y


def xcorr(x, y, maxlag):
    m = max(len(x), len(y))
    mx1 = min(maxlag, m - 1)
    ceilLog2 = nextpow2(2 * m - 1)
    m2 = 2 ** ceilLog2

    X = fftpack.fft(x, m2)
    Y = fftpack.fft(y, m2)
    c1 = np.real(fftpack.ifft(X * np.conj(Y)))
    index1 = np.arange(1, mx1+1, 1) + (m2 - mx1 -1)
    index2 = np.arange(1, mx1+2, 1) - 1
    c = np.hstack((c1[index1], c1[index2]))
    return c




if __name__ == "__main__":
    s = time.clock()
    a = [1, 2, 3, 4, 5]
    b = [6, 7, 8, 9, 10]
    c = xcorr(a, b, 3)
    e = time.clock()
    print(c)
    print(e-c)

以特定运行的结果为例：

[ 29.  56.  90. 130. 110.  86.  59.]
0.0001745000000001884

与MATLAB代码相比：

clear;close all;clc
tic
a = [1, 2, 3, 4, 5];
b = [6, 7, 8, 9, 10];
c = xcorr(a, b, 3)
toc


   29.0000   56.0000   90.0000  130.0000  110.0000   86.0000   59.0000

时间已过 0.000279 秒。

如果有人可以对此给出严格的数学推导，那将非常有帮助。

Answer 5

我认为我找到了一个解决方案，因为我遇到了同样的问题：

如果您有两个任意长度为N的向量x和y，并希望与固定len m的窗口互相关，则可以执行以下操作：

x = <some_data>
y = <some_data>

# Trim your variables
x_short = x[window:]
y_short = y[window:]

# do two xcorrelations, lagging x and y respectively
left_xcorr = np.correlate(x, y_short)  #defaults to 'valid'
right_xcorr = np.correlate(x_short, y) #defaults to 'valid'

# combine the xcorrelations
# note the first value of right_xcorr is the same as the last of left_xcorr
xcorr = np.concatenate(left_xcorr, right_xcorr[1:])

如果您想要有界关联，请记住您可能需要normalise变量

Answer 6

这是另一个来自here的答案，在边距上似乎比np.correlate更快，并且有回归标准化关联的好处：

def rolling_window(self, a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

def xcorr(self, x,y):

    N=len(x)
    M=len(y)
    meany=np.mean(y)
    stdy=np.std(np.asarray(y))
    tmp=self.rolling_window(np.asarray(x),M)
    c=np.sum((y-meany)*(tmp-np.reshape(np.mean(tmp,-1),(N-M+1,1))),-1)/(M*np.std(tmp,-1)*stdy)

    return c

Answer 7

我在这里回答，https://stackoverflow.com/a/47897581/5122657 matplotlib.xcorr有maxlags参数。它实际上是numpy.correlate的包装器，因此没有性能节省。然而，它给出了Matlab的互相关函数给出的完全相同的结果。下面我编辑了matplotlib中的代码，以便它只返回相关性。原因是如果我们按原样使用matplotlib.corr，它也将返回该图。问题是，如果我们将复杂数据类型作为参数放入其中，当matplotlib尝试绘制绘图时，我们将“生成复杂到实际数据类型”警告。

<!-- language: python -->

import numpy as np
import matplotlib.pyplot as plt

def xcorr(x, y, maxlags=10):
    Nx = len(x)
    if Nx != len(y):
        raise ValueError('x and y must be equal length')

    c = np.correlate(x, y, mode=2)

    if maxlags is None:
        maxlags = Nx - 1

    if maxlags >= Nx or maxlags < 1:
        raise ValueError('maxlags must be None or strictly positive < %d' % Nx)

    c = c[Nx - 1 - maxlags:Nx + maxlags]

    return c

Answer 8

@Warren Weckesser的答案是最好的，因为它利用numpy来节省性能（而不是每次延迟都调用corr）。但是，它会返回叉积（例如，各种滞后的输入之间的点积）。为了获得实际的互相关，我使用可选的mode参数修改了他的答案，如果将其设置为“ corr”，则返回互相关，如下所示：

def crosscorrelation(x, y, maxlag, mode='corr'):
    """
    Cross correlation with a maximum number of lags.

    `x` and `y` must be one-dimensional numpy arrays with the same length.

    This computes the same result as
        numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]

    The return vaue has length 2*maxlag + 1.
    """
    py = np.pad(y.conj(), 2*maxlag, mode='constant')
    T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
                   strides=(-py.strides[0], py.strides[0]))
    px = np.pad(x, maxlag, mode='constant')
    if mode == 'dot':       # get lagged dot product
        return T.dot(px)
    elif mode == 'corr':    # gets Pearson correlation
        return (T.dot(px)/px.size - (T.mean(axis=1)*px.mean())) / \
               (np.std(T, axis=1) * np.std(px))

如何限制Numpy中的互相关窗口宽度？

8 个答案: