我们正在开展一个项目并试图通过KPCA获得一些结果。
我们有一个数据集(手写数字)并且已经取每个数字的200个第一个数字,因此我们的完整traindata矩阵是2000x784(784是维度)。 当我们进行KPCA时,我们得到一个带有新的低维数据集的矩阵,例如2000x100。但是我们不了解结果。不应该得到其他矩阵,比如当我们为pca做svd时我们会这样做吗?我们用于KPCA的代码如下:
function data_out = kernelpca(data_in,num_dim)
%% Checking to ensure output dimensions are lesser than input dimension.
if num_dim > size(data_in,1)
fprintf('\nDimensions of output data has to be lesser than the dimensions of input data\n');
fprintf('Closing program\n');
return
end
%% Using the Gaussian Kernel to construct the Kernel K
% K(x,y) = -exp((x-y)^2/(sigma)^2)
% K is a symmetric Kernel
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
% We know that for PCA the data has to be centered. Even if the input data
% set 'X' lets say in centered, there is no gurantee the data when mapped
% in the feature space [phi(x)] is also centered. Since we actually never
% work in the feature space we cannot center the data. To include this
% correction a pseudo centering is done using the Kernel.
one_mat = ones(size(K));
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
clear K
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
我们已经阅读了很多论文,但仍然无法掌握kpca的逻辑!
任何帮助将不胜感激!
答案 0 :(得分:3)
PCA算法:
PCA数据样本
计算平均值
计算协方差
解决
:协方差矩阵。
:协方差矩阵的特征向量。
:协方差矩阵的特征值。
使用第一个第n个特征向量,可以将数据的维数降低到n维。你可以将这个code用于PCA,它有一个集成的例子,很简单。
KPCA算法:
我们在您的代码中选择一个内核函数,由以下内容指定:
K(x,y) = -exp((x-y)^2/(sigma)^2)
为了在高维空间中表示您的数据,在此空间中,您的数据将很好地表示为分类或聚类等其他元素,而此任务可能更难在初始特征空间中解决。这个技巧也被称为“内核技巧”。看看图。
[ Step1 ] 构造克矩阵
K = zeros(size(data_in,2),size(data_in,2));
for row = 1:size(data_in,2)
for col = 1:row
temp = sum(((data_in(:,row) - data_in(:,col)).^2));
K(row,col) = exp(-temp); % sigma = 1
end
end
K = K + K';
% Dividing the diagonal element by 2 since it has been added to itself
for row = 1:size(data_in,2)
K(row,row) = K(row,row)/2;
end
这里因为克矩阵是对称的,所以计算了一半的值,并且通过添加计算的到目前为止的克矩阵及其转置来获得最终结果。最后,我们在评论中提到2除以。
[ Step2 ] 规范化内核矩阵
这是由您的代码的这一部分完成的:
K_center = K - one_mat*K - K*one_mat + one_mat*K*one_mat;
正如评论所述,必须进行伪中心程序。关于证明here的想法。
[ Step3 ] 解决特征值问题
For this task this part of the code is responsible.
%% Obtaining the low dimensional projection
% The following equation needs to be satisfied for K
% N*lamda*K*alpha = K*alpha
% Thus lamda's has to be normalized by the number of points
opts.issym=1;
opts.disp = 0;
opts.isreal = 1;
neigs = 30;
[eigvec eigval] = eigs(K_center,[],neigs,'lm',opts);
eig_val = eigval ~= 0;
eig_val = eig_val./size(data_in,2);
% Again 1 = lamda*(alpha.alpha)
% Here '.' indicated dot product
for col = 1:size(eigvec,2)
eigvec(:,col) = eigvec(:,col)./(sqrt(eig_val(col,col)));
end
[~, index] = sort(eig_val,'descend');
eigvec = eigvec(:,index);
[ Step4 ] 更改每个数据点的代表
对于此任务,此部分代码负责。
%% Projecting the data in lower dimensions
data_out = zeros(num_dim,size(data_in,2));
for count = 1:num_dim
data_out(count,:) = eigvec(:,count)'*K_center';
end
查看详细信息here。
PS:我鼓励您使用从此author编写的代码并包含直观的示例。