python中的主成分分析降维

时间:2018-06-27 13:46:07

标签: python numpy machine-learning scikit-learn pca

我必须实现自己的PCA函数函数Y,V = PCA(data,M,whitening),该函数计算第一个M主体 分量并转换数据,因此y_n = U ^ T x_n。该功能应进一步 返回V,它解释了转换所解释的方差量。

我必须将数据D = 4的维数减小为M = 2>给定函数<< / p>

def PCA(data,nr_dimensions=None, whitening=False):
""" perform PCA and reduce the dimension of the data (D) to nr_dimensions
Input:
    data... samples, nr_samples x D
    nr_dimensions... dimension after the transformation, scalar
    whitening... False -> standard PCA, True -> PCA with whitening

Returns:
    transformed data... nr_samples x nr_dimensions
    variance_explained... amount of variance explained by the the first nr_dimensions principal components, scalar"""
if nr_dimensions is not None:
    dim = nr_dimensions
else:
    dim = 2

我要做的是以下事情:

import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats

from scipy.stats import multivariate_normal
import pdb

import sklearn
from sklearn import datasets

#covariance matrix
mean_vec = np.mean(data)
cov_mat = (data - mean_vec).T.dot((data - mean_vec)) / (data.shape[0] - 1)
print('Covariance matrix \n%s' % cov_mat)

#now the eigendecomposition of the cov matrix
cov_mat = np.cov(data.T)
    eig_vals, eig_vecs = np.linalg.eig(cov_mat)
    print('Eigenvectors \n%s' % eig_vecs)
    print('\nEigenvalues \n%s' % eig_vals)

# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]

这是我现在不知道该怎么做以及如何减小尺寸的点。

任何帮助都将受到欢迎! :)

2 个答案:

答案 0 :(得分:0)

这是一个简单的示例for the case,其中包含样本和特征的初始矩阵A具有shape=[samples, features]

from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig

# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)

# calculate the mean of each column since I assume that it's column is a variable/feature
M = mean(A.T, axis=1)
print(M)

# center columns by subtracting column means
C = A - M
print(C)

# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)

# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)

# project data
P = vectors.T.dot(C.T)
print(P.T)

答案 1 :(得分:0)

PCA实际上与奇异值分解相同,因此您可以使用numpy.linalg.svd

import numpy as np
def PCA(U,ndim,whitening=False):
    L,G,R=np.linalg.svd(U,full_matrices=False)
    if not whitening:
        L=L @ G
    Y=L[:,:ndim] @ R[:,:ndim].T
    return Y,G[:ndim]

如果要使用特征值问题,则假设样本数量大于要素数量(否则数据将不足),则直接计算空间相关性(左特征向量)效率低下。而是使用SVD使用正确的特征函数:

def PCA(U,ndim,whitening=False):
    K=U.T @ U               # Calculating right eigenvectors
    G,R=np.linalg.eigh(K)
    G=G[:,::-1]
    R=R[::-1]
    L=U @ R                 # reconstructing left ones
    nrm=np.linalg.norm(L,axis=0,keepdims=True)  #normalizing them
    L/=nrm
    if not whitening:
        L=L @ G
    Y=L[:,:ndim] @ R[:,:ndim].T
    return Y,G[:ndim]