我尝试用Python实现PCA。我的目标是创建一个与Matlab的PCA实现类似的版本。但是,我认为我错过了一个关键点,因为我的测试部分产生了带有错误符号的结果(+/-)。
你能找到算法的错误吗?为什么这些迹象有时会有所不同?
基于特征向量的PCA实现:
new_array_rank=4
A_mean = np.mean(A, axis=0)
A = A - A_mean
covariance_matrix = np.cov(A.T)
eigen_values, eigen_vectors = np.linalg.eig(covariance_matrix)
new_index = np.argsort(eigen_values)[::-1]
eigen_vectors = eigen_vectors[:,new_index]
eigen_values = eigen_values[new_index]
eigen_vectors = eigen_vectors[:,:new_array_rank]
return np.dot(eigen_vectors.T, A.T).T
我的测试值:
array([[ 0.13298325, 0.2896928 , 0.53589224, 0.58164269, 0.66202221,
0.95414116, 0.03040784, 0.26290471, 0.40823539, 0.37783385],
[ 0.90521267, 0.86275498, 0.52696221, 0.15243867, 0.20894357,
0.19900414, 0.50607341, 0.53995902, 0.32014539, 0.98744942],
[ 0.87689087, 0.04307512, 0.45065793, 0.29415066, 0.04908066,
0.98635538, 0.52091338, 0.76291385, 0.97213094, 0.48815925],
[ 0.75136801, 0.85946751, 0.10508436, 0.04656418, 0.08164919,
0.88129981, 0.39666754, 0.86325704, 0.56718669, 0.76346602],
[ 0.93319721, 0.5897521 , 0.75065047, 0.63916306, 0.78810679,
0.92909485, 0.23751963, 0.87552313, 0.37663086, 0.69010429],
[ 0.53189229, 0.68984247, 0.46164066, 0.29953259, 0.10826334,
0.47944168, 0.93935082, 0.40331874, 0.18541041, 0.35594587],
[ 0.36399075, 0.00698617, 0.61030608, 0.51136309, 0.54185601,
0.81383604, 0.50003674, 0.75414875, 0.54689801, 0.9957493 ],
[ 0.27815017, 0.65417397, 0.57207255, 0.54388744, 0.89128334,
0.3512483 , 0.94441934, 0.05305929, 0.77389942, 0.93125228],
[ 0.80409485, 0.2749575 , 0.22270875, 0.91869706, 0.54683128,
0.61501493, 0.7830902 , 0.72055598, 0.09363186, 0.05103846],
[ 0.12357816, 0.29758902, 0.87807485, 0.94348706, 0.60896429,
0.33899019, 0.36310027, 0.02380186, 0.67207071, 0.28638936]])
我的具有特征向量的PCA的结果:
array([[ 5.09548931e-01, -3.97079651e-01, -1.47555867e-01,
-3.55343967e-02, -4.92125732e-01, -1.78191399e-01,
-3.29543974e-02, 3.71406504e-03, 1.06404170e-01,
-1.66533454e-16],
[ -5.15879041e-01, 6.40833419e-01, -7.54601587e-02,
-2.00776798e-01, -7.07247669e-02, 2.68582368e-01,
-1.66124362e-01, 1.03414828e-01, 7.76738500e-02,
5.55111512e-17],
[ -4.42659342e-01, -5.13297786e-01, -1.65477203e-01,
5.33670847e-01, 2.00194213e-01, 2.06176265e-01,
1.31558875e-01, -2.81699724e-02, 6.19571305e-02,
-8.32667268e-17],
[ -8.50397468e-01, 5.14319846e-02, -1.46289906e-01,
6.51133920e-02, -2.83887201e-01, -1.90516618e-01,
1.45748370e-01, 9.49464768e-02, -1.05989648e-01,
4.16333634e-17],
[ -1.61040296e-01, -3.47929944e-01, -1.19871598e-01,
-6.48965493e-01, 7.53188055e-02, 1.31730340e-01,
1.33229858e-01, -1.43587499e-01, -2.20913989e-02,
-3.40005801e-16],
[ -1.70017435e-01, 4.22573148e-01, 4.81511942e-01,
2.42170125e-01, -1.18575764e-01, -6.87250591e-02,
-1.20660307e-01, -2.22865482e-01, -1.73666882e-02,
-1.52655666e-16],
[ 6.90841779e-02, -2.86233901e-01, -4.16612350e-01,
9.38935057e-03, 3.02325120e-01, -1.61783482e-01,
-3.55465509e-01, 1.15323059e-02, -5.04619674e-02,
4.71844785e-16],
[ 5.26189089e-01, 6.81324113e-01, -2.89960115e-01,
2.01781673e-02, 3.03159463e-01, -2.11777986e-01,
2.25937548e-01, -5.49219872e-05, 3.66268329e-02,
-1.11022302e-16],
[ 6.68680313e-02, -2.99715813e-01, 8.53428694e-01,
-1.30066853e-01, 2.31410283e-01, -1.02860624e-01,
1.95449586e-02, 1.30218425e-01, 1.68059569e-02,
2.22044605e-16],
[ 9.68303353e-01, 4.80944309e-02, 2.62865615e-02,
1.44821658e-01, -1.47094421e-01, 3.07366196e-01,
1.91849667e-02, 5.08517759e-02, -1.03558238e-01,
1.38777878e-16]])
使用Matlab的PCA功能测试相同数据的结果:
array([[ -5.09548931e-01, 3.97079651e-01, 1.47555867e-01,
3.55343967e-02, -4.92125732e-01, -1.78191399e-01,
-3.29543974e-02, -3.71406504e-03, -1.06404170e-01,
-0.00000000e+00],
[ 5.15879041e-01, -6.40833419e-01, 7.54601587e-02,
2.00776798e-01, -7.07247669e-02, 2.68582368e-01,
-1.66124362e-01, -1.03414828e-01, -7.76738500e-02,
-0.00000000e+00],
[ 4.42659342e-01, 5.13297786e-01, 1.65477203e-01,
-5.33670847e-01, 2.00194213e-01, 2.06176265e-01,
1.31558875e-01, 2.81699724e-02, -6.19571305e-02,
-0.00000000e+00],
[ 8.50397468e-01, -5.14319846e-02, 1.46289906e-01,
-6.51133920e-02, -2.83887201e-01, -1.90516618e-01,
1.45748370e-01, -9.49464768e-02, 1.05989648e-01,
-0.00000000e+00],
[ 1.61040296e-01, 3.47929944e-01, 1.19871598e-01,
6.48965493e-01, 7.53188055e-02, 1.31730340e-01,
1.33229858e-01, 1.43587499e-01, 2.20913989e-02,
-0.00000000e+00],
[ 1.70017435e-01, -4.22573148e-01, -4.81511942e-01,
-2.42170125e-01, -1.18575764e-01, -6.87250591e-02,
-1.20660307e-01, 2.22865482e-01, 1.73666882e-02,
-0.00000000e+00],
[ -6.90841779e-02, 2.86233901e-01, 4.16612350e-01,
-9.38935057e-03, 3.02325120e-01, -1.61783482e-01,
-3.55465509e-01, -1.15323059e-02, 5.04619674e-02,
-0.00000000e+00],
[ -5.26189089e-01, -6.81324113e-01, 2.89960115e-01,
-2.01781673e-02, 3.03159463e-01, -2.11777986e-01,
2.25937548e-01, 5.49219872e-05, -3.66268329e-02,
-0.00000000e+00],
[ -6.68680313e-02, 2.99715813e-01, -8.53428694e-01,
1.30066853e-01, 2.31410283e-01, -1.02860624e-01,
1.95449586e-02, -1.30218425e-01, -1.68059569e-02,
-0.00000000e+00],
[ -9.68303353e-01, -4.80944309e-02, -2.62865615e-02,
-1.44821658e-01, -1.47094421e-01, 3.07366196e-01,
1.91849667e-02, -5.08517759e-02, 1.03558238e-01,
-0.00000000e+00]])
答案 0 :(得分:2)
特征向量的符号和其他标准化选择是任意的。 Matlab和numpy以相同的方式规范了特征向量,但符号是任意的,可以依赖于所使用的线性代数库的细节。
当我编写matlab的princomp的numpy等价物时,我只是将我的单元测试中的matlab与matlab的符号进行了标准化。