Question

我有一个简单的matlab代码来生成一些随机数据，然后使用Euclidean和Mahalanobis分类器对随机数据进行分类。我遇到的问题是每个分类器的错误结果总是相同的。他们总是错误地分类相同的向量。但每次数据都不同。

因此，数据以简单的方式创建，以便轻松检查结果。因为我们有三个类都是等概率的，所以我只为每个类生成333个随机值，并将它们全部添加到X进行分类。因此，结果应为[class 1, class 2, class 3]，但每个结果为333。

我可以告诉分类器有效，因为我可以查看mvnrnd创建的数据每次都是随机的，并且错误会发生变化。但是在两个分类器之间，错误不会改变。

有人可以说出原因吗？

% Create some initial values, means, covariance matrix, etc
c = 3;
P = 1/c; % All 3 classes are equiprobable
N = 999;
m1 = [1, 1];
m2 = [12, 8];
m3 = [16, 1];
m = [m1; m2; m3];
S = [4 0; 0 4];    % All share the same covar matrix

% Generate random data for each class
X1 = mvnrnd(m1, S, N*P);
X2 = mvnrnd(m2, S, N*P);
X3 = mvnrnd(m3, S, N*P);
X = [X1; X2; X3];

% Create the solution array zEst to compare results to
xEst = ceil((3/999:3/999:3));

% Do the actual classification for mahalanobis and euclidean
zEuc = euc_mal_classifier(m', S, P, X', c, N, true);
zMal = euc_mal_classifier(m', S, P, X', c, N, false);

% Check the results
numEucErr = 0;
numMalErr = 0;
for i=1:N
    if(zEuc(i) ~= xEst(i))
        numEucErr = numEucErr + 1;
    end
    if(zMal(i) ~= xEst(i))
        numMalErr = numMalErr + 1;
    end
end

% Tell the user the results of the  classification
strE = ['Euclidean classifier error percent: ', num2str((numEucErr/N) * 100)];
strM = ['Mahalanob classifier error percent: ', num2str((numMalErr/N) * 100)];
disp(strE);
disp(strM);

分类器

function z = euc_mal_classifier( m, S, P, X, c, N, eOrM)
  for i=1:N
      for j=1:c
          if(eOrM == true)
              t(j) = sqrt((X(:,i)- m(:,j))'*(X(:,i)-m(:,j)));
          else
              t(j) = sqrt((X(:,i)- m(:,j))'*inv(S)*(X(:,i)-m(:,j)));
          end
      end
      [num, z(i)] = min(t);
  end

Answer 1

分类中没有差异的原因在于协方差矩阵。

假设一个点到一个类的中心的距离是[x，y]。

对于欧几里德，距离将是：

sqrt(x*x + y*y);

对于马哈拉诺比斯：

协方差矩阵的逆：

inv([a,0;0,a]) = [1/a,0;0,1/a]

距离是：

sqrt(x*x*1/a + y*y*1/a) = 1/sqrt(a)* sqrt(x*x + y*y)

因此，类的距离将与欧几里德相同，但具有比例因子。由于比例因子对于所有类和维度都是相同的，因此您不会发现类分配的差异！

使用不同的协方差矩阵对其进行测试，您会发现错误不同。

Answer 2

由于这种具有恒等协方差矩阵的数据，所有分类器应产生几乎相同的性能让我们看看没有身份协方差矩阵的数据，这三个分类器会导致不同的错误：

err_bayesian =
0.0861
err_euclidean =
0.1331
err_mahalanobis =
0.0871

close('all');clear;

% Generate and plot dataset X1
m1=[1, 1]'; m2=[10, 5]';m3=[11, 1]';
m=[m1 m2 m3];

S1 = [7 4 ; 4 5];
S(:,:,1)=S1;
S(:,:,2)=S1;
S(:,:,3)=S1;

P=[1/3 1/3 1/3];
N=1000;
randn('seed',0);
[X,y]   =generate_gauss_classes(m,S,P,N);
plot_data(X,y,m,1);

randn('seed',200);
[X4,y1] =generate_gauss_classes(m,S,P,N);


% 2.5_b.1 Applying Bayesian classifier
z_bayesian=bayes_classifier(m,S,P,X4);

% 2.5_b.2 Apply ML estimates of the mean values and covariance matrix (common to all three
% classes) using function Gaussian_ML_estimate
class1_data=X(:,find(y==1));
[m1_hat, S1_hat]=Gaussian_ML_estimate(class1_data);
class2_data=X(:,find(y==2));
[m2_hat, S2_hat]=Gaussian_ML_estimate(class2_data);
class3_data=X(:,find(y==3));
[m3_hat, S3_hat]=Gaussian_ML_estimate(class3_data);
S_hat=(1/3)*(S1_hat+S2_hat+S3_hat);
m_hat=[m1_hat m2_hat m3_hat];

% Apply the Euclidean distance classifier, using the ML estimates of the means, in order to
% classify the data vectors of X1
z_euclidean=euclidean_classifier(m_hat,X4);

% 2.5_b.3 Similarly, for the Mahalanobis distance classifier, we have
z_mahalanobis=mahalanobis_classifier(m_hat,S_hat,X4);



%  2.5_c. Compute the error probability for each classifier
err_bayesian = (1-length(find(y1==z_bayesian))/length(y1))
err_euclidean = (1-length(find(y1==z_euclidean))/length(y1))
err_mahalanobis = (1-length(find(y1==z_mahalanobis))/length(y1))

Euclidean和Mahalanobis分类器总是为每个分类器返回相同的错误？

2 个答案: