我是numpy的新手,并且一直陷入这个问题。 我有两个二维numpy数组,例如
x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))
我想使用numpy cov
函数逐行查找这两个ndarray的协方差。即,例如,在上面的示例中,输出数组应包含10个元素,每个元素表示ndarray的相应行的协方差。我知道我可以通过遍历行并找到两个一维数组的协方差来做到这一点,但这不是pythonic。
Edit1:两个数组的协方差表示在0, 1
索引处的元素。
Edit2:目前这是我的实现
s = numpy.empty((x.shape[0], 1))
for i in range(x.shape[0]):
s[i] = numpy.cov(x[i], y[i])[0][1]
答案 0 :(得分:2)
使用协方差的定义:E(XY) - E(X)E(Y)
。
import numpy as np
x = np.random.random((10, 5))
y = np.random.random((10, 5))
n = x.shape[1]
cov_bias = np.mean(x * y, axis=1) - np.mean(x, axis=1) * np.mean(y, axis=1))
cov_bias * n / (n-1)
请注意,cov_bias
对应于numpy.cov(bias=True)
的结果。
答案 1 :(得分:0)
这可行,但是我不确定大型矩阵x
和y
是否更快,调用numpy.cov(x, y)
计算出许多我们用numpy.diag
丢弃的条目:
x = numpy.random.random((10, 5))
y = numpy.random.random((10, 5))
# with loop
for (xi, yi) in zip(x, y):
print(numpy.cov(xi, yi)[0][1])
# vectorized
cov_mat = numpy.cov(x, y)
covariances = numpy.diag(cov_mat, x.shape[0])
print(covariances)
我还对n x n
大小的平方矩阵做了一些计时:
import time
import numpy
def run(n):
x = numpy.random.random((n, n))
y = numpy.random.random((n, n))
started = time.time()
for (xi, yi) in zip(x, y):
numpy.cov(xi, yi)[0][1]
needed_loop = time.time() - started
started = time.time()
cov_mat = numpy.cov(x, y)
covariances = numpy.diag(cov_mat, x.shape[0])
needed_vectorized = time.time() - started
print(
f"n={n:4d} needed_loop={needed_loop:.3f} s "
f"needed_vectorized={needed_vectorized:.3f} s"
)
for n in (100, 200, 500, 600, 700, 1000, 2000, 3000):
run(n)
我的慢速MacBook Air上的输出是
n= 100 needed_loop=0.006 s needed_vectorized=0.001 s
n= 200 needed_loop=0.011 s needed_vectorized=0.003 s
n= 500 needed_loop=0.033 s needed_vectorized=0.023 s
n= 600 needed_loop=0.041 s needed_vectorized=0.039 s
n= 700 needed_loop=0.043 s needed_vectorized=0.049 s
n=1000 needed_loop=0.061 s needed_vectorized=0.130 s
n=2000 needed_loop=0.137 s needed_vectorized=0.742 s
n=3000 needed_loop=0.224 s needed_vectorized=2.264 s
因此收支平衡点在n=600
左右
答案 2 :(得分:0)
这里是一个使用covariance
的定义并受到corr2_coeff_rowwise
启发的人-
def covariance_rowwise(A,B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(-1, keepdims=True)
B_mB = B - B.mean(-1, keepdims=True)
# Finally get covariance
N = A.shape[1]
return np.einsum('ij,ij->i',A_mA,B_mB)/(N-1)
样品运行-
In [66]: np.random.seed(0)
...: x = np.random.random((10, 5))
...: y = np.random.random((10, 5))
In [67]: s = np.empty((x.shape[0]))
...: for i in range(x.shape[0]):
...: s[i] = np.cov(x[i], y[i])[0][1]
In [68]: np.allclose(covariance_rowwise(x,y),s)
Out[68]: True
答案 3 :(得分:0)
选择cov(x,y)的对角向量并展开暗点:
numpy.expand_dims(numpy.diag(numpy.cov(x,y),x.shape[0]),1)