Python PCA - 投影到低维空间

时间:2016-04-21 13:43:53

标签: python numpy matrix projection pca

我正在尝试实现PCA,它在中间结果(如特征值和特征向量)方面效果很好。然而,当我尝试将数据(3维)投影到2D主成分空间时,结果是错误的。 我花了很多时间将我的代码与其他实现进行比较,例如:

http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

但很长一段时间没有进展,我找不到错误。我认为由于正确的中间结果,问题是一个简单的编码错误。 提前感谢任何真正阅读此问题的人,并感谢那些提供有用评论/答案的人。

我的代码如下:

import numpy as np

class PCA():   
def __init__(self, X):           
    #center the data        
    X = X - X.mean(axis=0)         
    #calculate covariance matrix based on X where data points are represented in rows
    C = np.cov(X, rowvar=False)    
    #get eigenvectors and eigenvalues
    d,u = np.linalg.eigh(C)        
    #sort both eigenvectors and eigenvalues descending regarding the eigenvalue
    #the output of np.linalg.eigh is sorted ascending, therefore both are turned around to reach a descending order
    self.U = np.asarray(u).T[::-1]    
    self.D = d[::-1]

**problem starts here**       

def project(self, X, m):
    #use the top m eigenvectors with the highest eigenvalues for the transformation matrix
    Z = np.dot(X,np.asmatrix(self.U[:m]).T)
    return Z

我的代码的结果是:

 myresult
 ([[ 0.03463706, -2.65447128],
   [-1.52656731,  0.20025725],
   [-3.82672364,  0.88865609],
   [ 2.22969475,  0.05126909],
   [-1.56296316, -2.22932369],
   [ 1.59059825,  0.63988429],
   [ 0.62786254, -0.61449831],
   [ 0.59657118,  0.51004927]])

correct result - such as by sklearn.PCA
([[ 0.26424835, -2.25344912],
 [-1.29695602,  0.60127941],
 [-3.59711235,  1.28967825],
 [ 2.45930604,  0.45229125],
 [-1.33335186, -1.82830153],
 [ 1.82020954,  1.04090645],
 [ 0.85747383, -0.21347615],
 [ 0.82618248,  0.91107143]])

The input is defined as follows: 
X = np.array([
[-2.133268233289599,0.903819474847349,2.217823388231679,-0.444779660856219,-0.661480010318842,-0.163814281248453,-0.608167714051449, 0.949391996219125],
[-1.273486742804804,-1.270450725314960,-2.873297536940942, 1.819616794091556,-2.617784834189455, 1.706200163080549,0.196983250752276,0.501491995499840],
[-0.935406638147949,0.298594472836292,1.520579082270122,-1.390457671168661,-1.180253547776717,-0.194988736923602,-0.645052874385757,-1.400566775105519]]).T 

1 个答案:

答案 0 :(得分:4)

您需要在将数据投影到新基础之前减去均值来对数据进行居中:

mu = X.mean(0)
C = np.cov(X - mu, rowvar=False)
d, u = np.linalg.eigh(C)
U = u.T[::-1]
Z = np.dot(X - mu, U[:2].T)

print(Z)
# [[ 0.26424835 -2.25344912]
#  [-1.29695602  0.60127941]
#  [-3.59711235  1.28967825]
#  [ 2.45930604  0.45229125]
#  [-1.33335186 -1.82830153]
#  [ 1.82020954  1.04090645]
#  [ 0.85747383 -0.21347615]
#  [ 0.82618248  0.91107143]]