我的Matlab代码是否适用于将PCA应用于数据?

时间:2014-08-01 09:28:03

标签: matlab classification pca

我在Matlab中有以下用于计算PCA的代码:

train_out = train';
test_out = test';
% subtract off the mean for each dimension
mn = mean(train_out,2);
train_out = train_out - repmat(mn,1,train_size);
test_out = test_out - repmat(mn,1,test_size);
% calculate the covariance matrix
covariance = 1 / (train_size-1) * train_out * train_out';
% find the eigenvectors and eigenvalues
[PC, V] = eig(covariance);
% extract diagonal of matrix as vector
V = diag(V);
% sort the variances in decreasing order
[junk, rindices] = sort(-1*V);
V = V(rindices);
PC = PC(:,rindices);
% project the original data set
out = PC' * train_out;
train_out = out';
out = PC' * test_out;
test_out = out';

训练和测试矩阵在行中有观察,在列中有特征变量。当我对原始数据进行分类(没有PCA)时,我得到的结果比用PCA好得多,即使我保留所有尺寸。当我尝试直接在整个数据集上进行PCA(训练+测试)时,我注意到这些新的主成分和之前的成分之间的相关性接近1或接近-1,我觉得很奇怪。我可能做错了什么但是无法弄明白。

1 个答案:

答案 0 :(得分:2)

代码是正确的,但是使用princomp函数我会更容易:

train_out=train; % save original data
test_out=test;
mn = mean(train_out);
train_out = bsxfun(@minus,train_out,mn); % substract mean
test_out = bsxfun(@minus,test_out,mn);
[coefs,scores,variances] = princomp(train_out,'econ'); % PCA
pervar = cumsum(variances) / sum(variances);
dims = max(find(pervar < var_frac)); % var_frac - e.g. 0.99 - fraction of variance explained
train_out = train_out*coefs(:,1:dims); % dims - keep this many dimensions
test_out = test_out*coefs(:,1:dims); % result is in train_out and test_out