使用Python估计自相关

时间:2013-01-12 19:19:25

标签: python numpy signal-processing

我想对下面显示的信号执行自相关。两个连续点之间的时间是2.5ms(或重复率为400Hz)。

enter image description here

这是我想要使用的估算自相关的等式(取自http://en.wikipedia.org/wiki/Autocorrelation,估算部分):

enter image description here

在python中查找估计的数据自相关的最简单方法是什么?我可以使用类似于numpy.correlate的东西吗?

或者我应该只计算均值和方差?


编辑:

unutbu的帮助下,我写了:

from numpy import *
import numpy as N
import pylab as P

fn = 'data.txt'
x = loadtxt(fn,unpack=True,usecols=[1])
time = loadtxt(fn,unpack=True,usecols=[0]) 

def estimated_autocorrelation(x):
    n = len(x)
    variance = x.var()
    x = x-x.mean()
    r = N.correlate(x, x, mode = 'full')[-n:]
    #assert N.allclose(r, N.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
    result = r/(variance*(N.arange(n, 0, -1)))
    return result

P.plot(time,estimated_autocorrelation(x))
P.xlabel('time (s)')
P.ylabel('autocorrelation')
P.show()

5 个答案:

答案 0 :(得分:29)

我不认为这个特定计算有NumPy函数。我就是这样写的:

def estimated_autocorrelation(x):
    """
    http://stackoverflow.com/q/14297012/190597
    http://en.wikipedia.org/wiki/Autocorrelation#Estimation
    """
    n = len(x)
    variance = x.var()
    x = x-x.mean()
    r = np.correlate(x, x, mode = 'full')[-n:]
    assert np.allclose(r, np.array([(x[:n-k]*x[-(n-k):]).sum() for k in range(n)]))
    result = r/(variance*(np.arange(n, 0, -1)))
    return result

assert语句用于检查计算并记录其意图。

如果您确信此功能的行为符合预期,您可以注释掉assert语句,或使用python -O运行脚本。 (-O标志告诉Python忽略断言语句。)

答案 1 :(得分:16)

我从pandas autocorrelation_plot()函数中获取了一部分代码。我用R检查了答案,并且值完全匹配。

import numpy
def acf(series):
    n = len(series)
    data = numpy.asarray(series)
    mean = numpy.mean(data)
    c0 = numpy.sum((data - mean) ** 2) / float(n)

    def r(h):
        acf_lag = ((data[:n - h] - mean) * (data[h:] - mean)).sum() / float(n) / c0
        return round(acf_lag, 3)
    x = numpy.arange(n) # Avoiding lag 0 calculation
    acf_coeffs = map(r, x)
    return acf_coeffs

答案 2 :(得分:11)

statsmodels包添加了一个内部使用np.correlate的自相关函数(根据statsmodels文档)。

请参阅: http://statsmodels.sourceforge.net/stable/generated/statsmodels.tsa.stattools.acf.html#statsmodels.tsa.stattools.acf

答案 3 :(得分:7)

我最近编辑时所写的方法现在比使用scipy.statstools.acf的{​​{1}}更快,直到样本量变得非常大。

错误分析如果您想调整偏差&获得高度准确的错误估算:请查看由Ulli Wolff here)实现this paper的代码or original by UW in Matlab

已测试的功能

  • fft=True来自找到的例程here
  • a = correlatedData(n=10000)gamma()
  • 位于同一位置
  • correlated_data()是我的功能
  • acorr()在另一个答案中找到
  • estimated_autocorrelation来自acf()

计时

from statsmodels.tsa.stattools import acf

编辑...我再次检查保留%timeit a0, junk, junk = gamma(a, f=0) # puwr.py %timeit a1 = [acorr(a, m, i) for i in range(l)] # my own %timeit a2 = acf(a) # statstools %timeit a3 = estimated_autocorrelation(a) # numpy %timeit a4 = acf(a, fft=True) # stats FFT ## -- End pasted text -- 100 loops, best of 3: 7.18 ms per loop 100 loops, best of 3: 2.15 ms per loop 10 loops, best of 3: 88.3 ms per loop 10 loops, best of 3: 87.6 ms per loop 100 loops, best of 3: 3.33 ms per loop 并将l=40更改为n=10000个样本,FFT方法开始获得一些牵引力,n=200000 fft实现边缘它......(顺序是一样的)

statsmodels

编辑2:我更改了常规并重新测试了## -- End pasted text -- 10 loops, best of 3: 86.2 ms per loop 10 loops, best of 3: 69.5 ms per loop 1 loops, best of 3: 16.2 s per loop 1 loops, best of 3: 16.3 s per loop 10 loops, best of 3: 52.3 ms per loop n=10000

的FFT
n=20000

实施

a = correlatedData(n=200000); b=correlatedData(n=10000)
m = a.mean(); rng = np.arange(40); mb = b.mean()
%timeit a1 = map(lambda t:acorr(a, m, t), rng)
%timeit a1 = map(lambda t:acorr.acorr(b, mb, t), rng)
%timeit a4 = acf(a, fft=True)
%timeit a4 = acf(b, fft=True)

10 loops, best of 3: 73.3 ms per loop   # acorr below
100 loops, best of 3: 2.37 ms per loop  # acorr below
10 loops, best of 3: 79.2 ms per loop   # statstools with FFT
100 loops, best of 3: 2.69 ms per loop # statstools with FFT

def acorr(op_samples, mean, separation, norm = 1): """autocorrelation of a measured operator with optional normalisation the autocorrelation is measured over the 0th axis Required Inputs op_samples :: np.ndarray :: the operator samples mean :: float :: the mean of the operator separation :: int :: the separation between HMC steps norm :: float :: the autocorrelation with separation=0 """ return ((op_samples[:op_samples.size-separation] - mean)*(op_samples[separation:]- mean)).ravel().mean() / norm 加速可以在下面实现。您必须小心只传递4x,因为它会op_samples=a.copy()修改数组a,否则:

a-=mean

完整性检查

enter image description here

示例错误分析

这有点超出了范围,但是如果没有集成的自相关时间或积分窗口计算,我就无法重做数字。底部图中清楚地显示了具有错误的自相关 enter image description here

答案 4 :(得分:1)

我发现这得到了预期的结果,只是略有改变:

getaddrinfo()

根据Excel的自相关结果进行测试。