MATLAB中的分层聚类

时间:2018-01-12 15:17:00

标签: matlab hierarchical-clustering linkage

我通过以下方式将数据X与层次聚类聚类在一起:

X = [1 1 1;
     2 2 2;
     1 1 0;
     1 2 2];
Y = pdist(X);
T = linkage(Y, 'complete');
c = cluster(T,'maxclust',2);
  

所以,X(1,:)和X(3,:)属于集群#1而其他人属于   集群#2。

如何确定应将新数据点(不在X中)分配到哪个群集?例如[1 0 1]属于哪个群集?

1 个答案:

答案 0 :(得分:0)

简单的解决方案是找到最近的聚类质心。

最近的质心

x_new = [1 0 1];

% Find cluster centroid
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
   X_c(cid,:) = mean(X(c == cid,:)); 
end

% Find closest centroid
[~,c_new] = min(pdist2(x_new,X_c));

如果您有更多样本并想要考虑方差,您可以计算欧氏距离的z分数

距离的Z分数

x_new = [1 0 1];
X_means = zeros(1,numel(unique(c)));
X_stds = zeros(1,numel(unique(c)));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
   distances = pdist2(X(c == cid,:),mean(X(c == cid,:))); 
   X_means(cid) = mean(distances);
   X_stds(cid) = std(distances);

   X_c(cid,:) = mean(X(c == cid,:)); 
end
[~,c_new] = min((pdist2(x_new,X_c) - X_means)./X_stds);

如果你想要考虑组件差异,你可以得到组件距离的Z分数(我不确定这个结果与上面的结果有什么不同......)

分量距离的平均Z分数

x_new = [1 0 1];
X_means = zeros(numel(unique(c)),size(X,2));
X_stds = zeros(numel(unique(c)),size(X,2));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
   comp_distances = abs(X(c == cid,:) - repmat(mean(X(c == cid,:)),[numel(find(c==cid)),1])); 
   X_means(cid,:) = mean(comp_distances);
   X_stds(cid,:) = std(comp_distances);

   X_c(cid,:) = mean(X(c == cid,:)); 
end
[~,c_new] = min(mean(((repmat(x_new,[size(X_c,1),1])-X_c) - X_means)./X_stds,2));