我正在尝试使用MATLAB's princomp
进行降维,但我不确定我做得对。
以下是我的测试代码,但我不确定我是否正确投影:
A = rand(4,3)
AMean = mean(A)
[n m] = size(A)
Ac = (A - repmat(AMean,[n 1]))
pc = princomp(A)
k = 2; %Number of first principal components
A_pca = Ac * pc(1:k,:)' %Not sure I'm doing projection right
reconstructedA = A_pca * pc(1:k,:)
error = reconstructedA- Ac
我使用ORL数据集进行人脸识别的代码:
%load orl_data 400x768 double matrix (400 images 768 features)
%make labels
orl_label = [];
for i = 1:40
orl_label = [orl_label;ones(10,1)*i];
end
n = size(orl_data,1);
k = randperm(n);
s = round(0.25*n); %Take 25% for train
%Raw pixels
%Split on test and train sets
data_tr = orl_data(k(1:s),:);
label_tr = orl_label(k(1:s),:);
data_te = orl_data(k(s+1:end),:);
label_te = orl_label(k(s+1:end),:);
tic
[nn_ind, estimated_label] = EuclDistClassifier(data_tr,label_tr,data_te);
toc
rate = sum(estimated_label == label_te)/size(label_te,1)
%Using PCA
tic
pc = princomp(data_tr);
toc
mean_face = mean(data_tr);
pc_n = 100;
f_pc = pc(1:pc_n,:)';
data_pca_tr = (data_tr - repmat(mean_face, [s,1])) * f_pc;
data_pca_te = (data_te - repmat(mean_face, [n-s,1])) * f_pc;
tic
[nn_ind, estimated_label] = EuclDistClassifier(data_pca_tr,label_tr,data_pca_te);
toc
rate = sum(estimated_label == label_te)/size(label_te,1)
如果我选择足够的主成分,它会给我相同的识别率。如果我使用少量principal components(PCA),那么使用PCA的费率会更差。
以下是一些问题:
princomp
是使用MATLAB计算第一个 k 主成分的最佳方法吗?我还使用gpuArray
%Test using GPU
tic
A_cpu = rand(30000,32*24);
A = gpuArray(A_cpu);
AMean = mean(A);
[n m] = size(A)
pc = princomp(A);
k = 100;
A_pca = (A - repmat(AMean,[n 1])) * pc(1:k,:)';
A_pca_cpu = gather(A_pca);
toc
clear;
tic
A = rand(30000,32*24);
AMean = mean(A);
[n m] = size(A)
pc = princomp(A);
k = 100;
A_pca = (A - repmat(AMean,[n 1])) * pc(1:k,:)';
toc
clear;
它工作得更快,但它不适合大型矩阵。也许我错了?
如果我使用大矩阵,它会给我:
使用gpuArray设备上的内存不足。
答案 0 :(得分:1)
“princomp函数是使用MATLAB计算前k个主成分的最佳方法吗?”
它正在计算完整的SVD,因此在大型数据集上会很慢。通过在开始时指定所需的维数并计算部分svd,可以显着提高速度。部分svd的matlab函数是svds
。
如果svds'不够快,那么这里有一个更现代的实现:
http://cims.nyu.edu/~tygert/software.html(matlab版本:http://code.google.com/p/framelet-mri/source/browse/pca.m)
(参见描述算法的文章http://cims.nyu.edu/~tygert/blanczos.pdf)
您可以通过增加计算的奇异向量的数量来控制近似的误差,在链接的纸张中有精确的界限。这是一个例子:
>> A = rand(40,30); %random rank-30 matrix
>> [U,S,V] = pca(A,2); %compute a rank-2 approximation to A
>> norm(A-U*S*V',2)/norm(A,2) %relative error
ans =
0.1636
>> [U,S,V] = pca(A,25); %compute a rank-25 approximation to A
>> norm(A-U*S*V',2)/norm(A,2) %relative error
ans =
0.0410
当您拥有大量数据和稀疏矩阵计算时,通常不可能实现完整的SVD,因为这些因素永远不会稀疏。在这种情况下,您必须计算部分SVD以适合内存。例如:
>> A = sprandn(5000,5000,10000);
>> tic;[U,S,V]=pca(A,2);toc;
no pivots
Elapsed time is 124.282113 seconds.
>> tic;[U,S,V]=svd(A);toc;
??? Error using ==> svd
Use svds for sparse singular values and vectors.
>> tic;[U,S,V]=princomp(A);toc;
??? Error using ==> svd
Use svds for sparse singular values and vectors.
Error in ==> princomp at 86
[U,sigma,coeff] = svd(x0,econFlag); % put in 1/sqrt(n-1) later
>> tic;pc=princomp(A);toc;
??? Error using ==> eig
Use eigs for sparse eigenvalues and vectors.
Error in ==> princomp at 69
[coeff,~] = eig(x0'*x0);