我必须实现自己的PCA函数函数Y,V = PCA(data,M,whitening),该函数计算第一个M主体 分量并转换数据,因此y_n = U ^ T x_n。该功能应进一步 返回V,它解释了转换所解释的方差量。
我必须将数据D = 4的维数减小为M = 2>给定函数<< / p>
def PCA(data,nr_dimensions=None, whitening=False):
""" perform PCA and reduce the dimension of the data (D) to nr_dimensions
Input:
data... samples, nr_samples x D
nr_dimensions... dimension after the transformation, scalar
whitening... False -> standard PCA, True -> PCA with whitening
Returns:
transformed data... nr_samples x nr_dimensions
variance_explained... amount of variance explained by the the first nr_dimensions principal components, scalar"""
if nr_dimensions is not None:
dim = nr_dimensions
else:
dim = 2
我要做的是以下事情:
import numpy as np
import matplotlib.cm as cm
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import multivariate_normal
import pdb
import sklearn
from sklearn import datasets
#covariance matrix
mean_vec = np.mean(data)
cov_mat = (data - mean_vec).T.dot((data - mean_vec)) / (data.shape[0] - 1)
print('Covariance matrix \n%s' % cov_mat)
#now the eigendecomposition of the cov matrix
cov_mat = np.cov(data.T)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)
print('Eigenvectors \n%s' % eig_vecs)
print('\nEigenvalues \n%s' % eig_vals)
# Make a list of (eigenvalue, eigenvector) tuples
eig_pairs = [(np.abs(eig_vals[i]), eig_vecs[:,i]) for i in range(len(eig_vals))]
这是我现在不知道该怎么做以及如何减小尺寸的点。
任何帮助都将受到欢迎! :)
答案 0 :(得分:0)
for the case
,其中包含样本和特征的初始矩阵A具有shape=[samples, features]
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column since I assume that it's column is a variable/feature
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
答案 1 :(得分:0)
PCA实际上与奇异值分解相同,因此您可以使用numpy.linalg.svd
:
import numpy as np
def PCA(U,ndim,whitening=False):
L,G,R=np.linalg.svd(U,full_matrices=False)
if not whitening:
L=L @ G
Y=L[:,:ndim] @ R[:,:ndim].T
return Y,G[:ndim]
如果要使用特征值问题,则假设样本数量大于要素数量(否则数据将不足),则直接计算空间相关性(左特征向量)效率低下。而是使用SVD使用正确的特征函数:
def PCA(U,ndim,whitening=False):
K=U.T @ U # Calculating right eigenvectors
G,R=np.linalg.eigh(K)
G=G[:,::-1]
R=R[::-1]
L=U @ R # reconstructing left ones
nrm=np.linalg.norm(L,axis=0,keepdims=True) #normalizing them
L/=nrm
if not whitening:
L=L @ G
Y=L[:,:ndim] @ R[:,:ndim].T
return Y,G[:ndim]