Question

我想计算3个数组X1，X2和Y的自协方差，它们都是平稳随机过程。 sciPy或其他库中是否有任何功能可以解决这个问题？

Answer 1

Statsmodels具有自动和交叉协方差函数

http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acovf.html http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.ccovf.html

加上相关函数和部分自相关 http://statsmodels.sourceforge.net/devel/tsa.html#descriptive-statistics-and-tests

Answer 2

根据离散信号的自协方差系数的标准估计，可用公式表示：

enter image description here

...其中x(i)是给定信号（即特定的1D向量），k代表x(i)信号向k个样本的移位，{{1 }}是N信号的长度，并且：

enter image description here

...这是简单的平均值，我们可以写：

x(i)

如果你想对自协方差系数进行归一化，这将成为表示为的自相关系数：

enter image description here

...比你只需要在上面的代码中增加两行：

'''
Calculate the autocovarriance coefficient.
'''

import numpy as np

Xi = np.array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5])
N = np.size(Xi)
k = 5
Xs = np.average(Xi)

def autocovariance(Xi, N, k, Xs):
    autoCov = 0
    for i in np.arange(0, N-k):
        autoCov += ((Xi[i+k])-Xs)*(Xi[i]-Xs)
    return (1/(N-1))*autoCov

print("Autocovariance:", autocovariance(Xi, N, k, Xs))

这是完整的脚本：

def autocorrelation():
    return autocovariance(Xi, N, k, Xs) / autocovariance(Xi, N, 0, Xs)

Answer 3

获取样本自动协方差：

# cov_auto_samp(X,delta)/cov_auto_samp(X,0) = auto correlation
def cov_auto_samp(X,delta):
    N = len(X)
    Xs = np.average(X)
    autoCov = 0.0
    times = 0.0
    for i in np.arange(0, N-delta):
        autoCov += (X[i+delta]-Xs)*(X[i]-Xs)
        times +=1
    return autoCov/times

Answer 4

对前面的答案进行了一些小调整，避免了python for循环并改为使用numpy数组操作。如果您有大量数据，这将更快。

def lagged_auto_cov(Xi,t):
    """
    for series of values x_i, length N, compute empirical auto-cov with lag t
    defined: 1/(N-1) * \sum_{i=0}^{N-t} ( x_i - x_s ) * ( x_{i+t} - x_s )
    """
    N = len(Xi)

    # use sample mean estimate from whole series
    Xs = np.mean(Xi)

    # construct copies of series shifted relative to each other, 
    # with mean subtracted from values
    end_padded_series = np.zeros(N+t)
    end_padded_series[:N] = Xi - Xs
    start_padded_series = np.zeros(N+t)
    start_padded_series[t:] = Xi - Xs

    auto_cov = 1./(N-1) * np.sum( start_padded_series*end_padded_series )
    return auto_cov

将此与@bluevoxel的代码进行比较，使用50,000个数据点的时间序列并计算单个固定滞后值的自相关，python for循环代码平均约为30毫秒和使用numpy阵列的平均速度超过0.3毫秒（在我的笔记本电脑上运行）。

Answer 5

@user333700有正确的答案。使用库（例如statsmodels）通常比编写自己的库更受欢迎。但是，至少有一次实现自己的洞察力是很有见地的。

def _check_autocovariance_input(x):
        if len(x) < 2:
            raise ValueError('Need at least two elements to calculate autocovariance')

def get_autocovariance_given_lag(x, lag):
    _check_autocovariance_input(x)

    x_centered = x - np.mean(x)
    a = np.pad(x_centered, pad_width=(0, lag), mode='constant')
    b = np.pad(x_centered, pad_width=(lag, 0), mode='constant')
    return np.dot(a, b) / len(x)

def get_autocovariance(x):
    _check_autocovariance_input(x)
    x_centered = x - np.mean(x)
    return np.correlate(x_centered, x_centered, mode='full')[len(x) - 1:] / len(x)

我有get_autocovariance_given_lag函数计算给定滞后的自协方差。

如果您对所有滞后感兴趣，可以使用get_autocovariance。 np.correlate函数是statsmodels在引擎盖下使用的函数。它计算互相关。这是一个滑点产品。例如，假设数组是[1,2,3]。然后我们得到：

      [1, 2, 3]      = 3 * 1 = 3
[1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
   [1, 2, 3]


      [1, 2, 3]      = 1 * 1 + 2 * 2 + 3 * 3 = 14
      [1, 2, 3]

      [1, 2, 3]      = 2 * 1 + 3 * 2 = 8
         [1, 2, 3]

      [1, 2, 3]      = 3 * 1 = 3
            [1, 2, 3]

但请注意，我们对从滞后0开始的协方差感兴趣。这是什么？好吧，这发生在我们将N - 1位置移到右侧N是数组的长度之后。这就是我们从N-1开始返回数组的原因。

如何在Python中计算自协方差

5 个答案: