矢量化3D阵列的NumPy协方差

时间:2016-11-03 05:59:33

标签: python numpy multidimensional-array vectorization covariance

我有一个形状为(t,n1,n2)的3D numpy数组:

x=np.random.rand(10,2,4)

我需要计算另一个形状为3D的{​​{1}}数组y,以便:

对于沿第一轴的所有切片,

(t,n1,n1)等等。

因此,循环实现将是 -

y[0] = np.cov[x[0,:,:])

有没有办法对此进行矢量化,以便我可以一次性计算所有协方差矩阵?我试过了:

y=np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
    y[i]=np.cov(x[i,:,:])

但它不起作用。

1 个答案:

答案 0 :(得分:2)

入侵numpy.cov source code并尝试使用默认参数。事实证明,np.cov(x[i,:,:])只是:

N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T)  /(N - 1)

因此,任务是对此循环进行矢量化,该循环将遍历i并一次处理来自x的所有数据。同样,我们可以在第三步使用broadcasting。对于最后一步,我们在第一轴的所有切片上执行sum-reduction。这可以使用np.einsum以矢量化方式有效地实现。因此,最终实现了 -

N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)

运行时测试

In [155]: def original_app(x):
     ...:     n = x.shape[0]
     ...:     y = np.zeros((n,2,2))
     ...:     for i in np.arange(x.shape[0]):
     ...:         y[i]=np.cov(x[i,:,:])
     ...:     return y
     ...: 
     ...: def proposed_app(x):
     ...:     N = x.shape[2]
     ...:     m1 = x - x.sum(2,keepdims=1)/N
     ...:     out = np.einsum('ijk,ilk->ijl',m1,m1)  / (N - 1)
     ...:     return out
     ...: 

In [156]: # Setup inputs
     ...: n = 10000
     ...: x = np.random.rand(n,2,4)
     ...: 

In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True  # Results verified

In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop

In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
那里有巨大的加速!