计算协方差矩阵 - numpy.cov和numpy.dot之间的区别?

时间:2012-09-13 04:14:59

标签: python numpy covariance pca

我正在使用三维numpy数组,我将最终执行PCA。我首先将3-D阵列展平为2-D,以便我可以计算协方差(然后是特征值和特征向量)。

在计算协方差矩阵时,我使用numpy.cov与numpy.dot得到不同的结果。如果我的2-D阵列是(5,9),我想最终得到5x5(即NxN)协方差矩阵。这是我使用numpy.dot得到的。使用numpy.cov,我最终得到一个9x9的协方差矩阵。这不符合我的需要,但说实话,我不知道哪一个是正确的。我已经看到两种方法都用于计算我研究的例子中的协方差。

如果我通过numpy.linalg.eig计算携带numpy.dot与numpy.cov,我显然会得到不同的答案(所有输出都在下面打印)。所以,我在这一点上很困惑,哪种方法是正确的,或者我可能出错了。

这是带输出的测试代码。谢谢你的帮助。

import numpy as np

a = np.random.random(((5,3,3))); # example of what real input will look like

# create 2D flattened version of 3D input array
d1,d2,d3 = a.shape
b = np.zeros([d1,d2*d3])
for i in range(len(a)):
  b[i] = a[i].flatten()

print "shape of 3D array: ", a.shape
print "shape of flattened 2D array: ", b.shape, "\n"
print "flattened 2D array:\n", b, "\n"

# mean-center the flattened array
b -= np.mean(b, axis=0)

# calculate the covariance matrix of the flattened array
covar1 = np.cov(b, rowvar=0)   # this makes a 9x9 array
covar2 = np.dot(b, b.T)        # this makes a 5x5 array

print "covariance via numpy.cov:\n", covar1, "\n"
print "covariance via numpy.dot:\n", covar2, "\n"

# calculate eigenvalues and eigenvectors
eval1, evec1 = np.linalg.eig(covar1)
eval2, evec2 = np.linalg.eig(covar2)

print "eigenvalues via numpy.cov covariance matrix:\n", eval1, "\n"
print "eigenvectors via numpy.cov covariance matrix:\n", evec1, "\n"
print "eigenvalues via numpy.dot covariance matrix:\n", eval2, "\n"
print "eigenvectors via numpy.dot covariance matrix:\n", evec2, "\n"


======= Output =======

shape of 3D array:  (5, 3, 3)
shape of flattened 2D array:  (5, 9)

flattened 2D array:
[[ 0.94964127  0.71015973  0.80994774  0.49727821  0.38270025  0.89136202
   0.19876615  0.72461047  0.43646456]
 [ 0.00502329  0.70593521  0.44001479  0.97576486  0.37261107  0.6318449
   0.86301405  0.21820704  0.91507706]
 [ 0.75411747  0.98462782  0.65109776  0.1083943   0.12867679  0.63172813
   0.85803498  0.89507165  0.62291308]
 [ 0.88589874  0.02797773  0.6421045   0.17255432  0.5713524   0.28589519
   0.55888288  0.7961657   0.4453764 ]
 [ 0.85774793  0.19511453  0.92167001  0.27340606  0.41849435  0.98349776
   0.19354437  0.2974041   0.52064868]]

covariance via numpy.cov():
[[ 0.15180806 -0.04977355  0.05733885 -0.11340765  0.00840097  0.01461576
  -0.08596712  0.07512366 -0.07509614]
 [-0.04977355  0.15853367 -0.02337953  0.0357429  -0.05604085  0.02600021
   0.06158462  0.0229808   0.03506849]
 [ 0.05733885 -0.02337953  0.0335786  -0.03485899  0.00294469  0.03209583
  -0.05378417  0.00490397 -0.02751816]
 [-0.11340765  0.0357429  -0.03485899  0.12340238  0.0052609   0.0144986
   0.02494029 -0.07492008  0.05109007]
 [ 0.00840097 -0.05604085  0.00294469  0.0052609   0.02529647 -0.01263607
  -0.02327657 -0.01136774 -0.01037048]
 [ 0.01461576  0.02600021  0.03209583  0.0144986  -0.01263607  0.07415853
  -0.05387152 -0.0345835  -0.00342481]
 [-0.08596712  0.06158462 -0.05378417  0.02494029 -0.02327657 -0.05387152
   0.11053971  0.00903926  0.04727671]
 [ 0.07512366  0.0229808   0.00490397 -0.07492008 -0.01136774 -0.0345835
   0.00903926  0.09436665 -0.03526195]
 [-0.07509614  0.03506849 -0.02751816  0.05109007 -0.01037048 -0.00342481
   0.04727671 -0.03526195  0.03900974]]

covariance via numpy.dot():
[[ 0.3211555  -0.34304471 -0.01453859 -0.1071505   0.14357829]
 [-0.34304471  1.24506647 -0.11174019 -0.43907983 -0.35120174]
 [-0.01453859 -0.11174019  0.57018674 -0.10412646 -0.3397815 ]
 [-0.1071505  -0.43907983 -0.10412646  0.60465919  0.0456976 ]
 [ 0.14357829 -0.35120174 -0.3397815   0.0456976   0.50170735]]

eigenvalues via numpy.cov covariance matrix:
[  3.34339027e-01 +0.00000000e+00j   1.98268985e-01 +0.00000000e+00j
   5.71434551e-02 +0.00000000e+00j   1.13399310e-01 +0.00000000e+00j
   3.38418299e-18 +1.46714498e-17j   3.38418299e-18 -1.46714498e-17j
   1.20944017e-18 +0.00000000e+00j  -8.89005842e-18 +0.00000000e+00j
  -6.59244508e-18 +0.00000000e+00j]

eigenvectors via numpy.cov covariance matrix:
[[-0.33898927+0.j          0.01567746+0.j         -0.32410513+0.j
   0.01868249+0.j          0.03901578-0.09858459j  0.03901578+0.09858459j
  -0.17596347+0.j          0.08294235+0.j          0.04883282+0.j        ]
 [ 0.03740184+0.j         -0.01106985+0.j          0.11199662+0.j
  -0.36257285+0.j          0.66513867+0.j          0.66513867+0.j
   0.34810753+0.j         -0.05174886+0.j         -0.21147240+0.j        ]
 [ 0.42193056+0.j          0.10153367+0.j         -0.52774125+0.j
  -0.57292678+0.j         -0.02584078-0.15425679j -0.02584078+0.15425679j
  -0.02594397+0.j         -0.23132722+0.j         -0.33824532+0.j        ]
 [-0.08723679+0.j         -0.17700647+0.j         -0.04490487+0.j
   0.14531440+0.j         -0.08669754+0.21485879j -0.08669754-0.21485879j
  -0.73208352+0.j          0.04474123+0.j         -0.09159437+0.j        ]
 [-0.26991334+0.j          0.39182156+0.j          0.18023454+0.j
  -0.14727224+0.j         -0.21261400+0.1100362j  -0.21261400-0.1100362j
   0.15211635+0.j          0.54168898+0.j         -0.36386803+0.j        ]
 [-0.39361702+0.j          0.48389127+0.j          0.12668909+0.j
   0.07739853+0.j          0.31569702-0.34166187j  0.31569702+0.34166187j
   0.11287735+0.j         -0.74889136+0.j         -0.42472067+0.j        ]
 [-0.29962418+0.j         -0.01577641+0.j          0.35742257+0.j
  -0.68969822+0.j         -0.28182091+0.13998238j -0.28182091-0.13998238j
  -0.40124817+0.j          0.06419507+0.j          0.47506061+0.j        ]
 [-0.57032501+0.j         -0.60505095+0.j         -0.30688172+0.j
  -0.11823642+0.j          0.07618472-0.0915626j   0.07618472+0.0915626j
   0.32272841+0.j         -0.10872383+0.j         -0.25867852+0.j        ]
 [-0.23498699+0.j          0.45164240+0.j         -0.57569388+0.j
   0.03856674+0.j         -0.07478874+0.27512969j -0.07478874-0.27512969j
  -0.10101603+0.j          0.25440413+0.j          0.47403650+0.j        ]]

eigenvalues via numpy.dot covariance matrix:
[  1.33735611e+00   7.93075942e-01   2.08276008e-16   4.53597239e-01
   2.28573820e-01]

eigenvectors via numpy.dot covariance matrix:
[[ 0.1223889  -0.87441162 -0.4472136  -0.13172011  0.05545353]
 [-0.54658696  0.08157704 -0.4472136   0.61361759  0.34360056]
 [ 0.70163289  0.24699239 -0.4472136   0.41717057 -0.26958257]
 [-0.41754523  0.17603863 -0.4472136  -0.33135976 -0.69632398]
 [ 0.1401104   0.36980356 -0.4472136  -0.56770828  0.56685246]]

1 个答案:

答案 0 :(得分:4)

np.dot只是两个矩阵的矩阵乘积。这不是协方差。你为什么使用rowvar=0?如果你只是做np.cov(b),它会给出一个正确尺寸的矩阵。