Question

我在大型数据集上进行了线性SVM，但是为了减少我执行PCA的维数，而不是在组件得分的子集上进行SVM（前650个组件解释了99.5％的方差）。现在，我想使用来自在PCA空间中创建的SVM的β权重和偏差来绘制原始变量空间中的决策边界。但我无法弄清楚如何将偏差项从SVM投影到原始变量空间。我用fisher虹膜数据编写了一个演示来说明：

clear; clc; close all

% load data
load fisheriris
inds = ~strcmp(species,'setosa');
X = meas(inds,3:4);
Y = species(inds);
mu = mean(X)

% perform the PCA
[eigenvectors, scores] = pca(X);

% train the svm
SVMModel = fitcsvm(scores,Y);

% plot the result
figure(1)
gscatter(scores(:,1),scores(:,2),Y,'rgb','osd')
title('PCA space')

% now plot the decision boundary
betas = SVMModel.Beta; 
m = -betas(1)/betas(2); % my gradient
b = -SVMModel.Bias;     % my y-intercept
f = @(x) m.*x + b;      % my linear equation
hold on
fplot(f,'k')
hold off
axis equal
xlim([-1.5 2.5])
ylim([-2 2])

% inverse transform the PCA
Xhat = scores * eigenvectors';
Xhat = bsxfun(@plus, Xhat, mu);

% plot the result
figure(2)
hold on
gscatter(Xhat(:,1),Xhat(:,2),Y,'rgb','osd')

% and the decision boundary
betaHat = betas' * eigenvectors';
mHat = -betaHat(1)/betaHat(2);
bHat = b * eigenvectors';
bHat = bHat + mu;    % I know I have to add mu somewhere...
bHat = bHat/betaHat(2);
bHat = sum(sum(bHat)); % sum to reduce the matrix to a single value
% the correct value of bHat should be 6.3962

f = @(x) mHat.*x + bHat;
fplot(f,'k')
hold off

axis equal
title('Recovered feature space')
xlim([3 7])
ylim([0 4])

关于我如何错误地计算bHat的任何指导都将非常感激。

Answer 1

以防其他人遇到此问题，解决方案是偏差项可用于查找y轴截距b = -SVMModel.Bias/betas(2)。并且y轴截距只是空间[0 b]中的另一个点，它可以通过PCA逆变换来恢复/不旋转。然后可以使用该新点来求解线性方程y = mx + b（即，b = y-mx）。所以代码应该是：

% and the decision boundary 
betaHat = betas' * eigenvectors'; 
mHat = -betaHat(1)/betaHat(2);
yint = b/betas(2);                   % y-intercept in PCA space
yintHat = [0 b] * eigenvectors';     % recover in original space
yintHat = yintHat + mu;    
bHat = yintHat(2) - mHat*yintHat(1); % solve the linear equation
% the correct value of bHat is now 6.3962

如何在Matlab中用PCA绘制线性SVM的决策边界？

1 个答案: