两个集群之间的最近点Matlab

时间:2013-08-07 05:47:02

标签: matlab cluster-analysis

我有一组由3D点组成的聚类。我想从每两个星团中得到最近的两个点。

例如:我有5个簇C1到C5,由3D点组成。对于C1和C2,有两个点Pc1“C1中的点”和Pc2“C2中的点”,它们是两个簇C1和C2之间的两个点,C1和C3..C5之间相同,C2和C3之间相同。 .C5等。之后,我将有20个点代表不同集群之间的最近点。

第二件事是,如果每个点之间的距离小于一定距离“阈值”,我想将这些点连接在一起。

所以我问是否有人可以告诉我

Update:

感谢Amro的回答,我已将其更新为CIDX = kmeans(X,K,'distance','cityblock','replicates',5);解决空簇错误。但是另一个错误出现了“pdistmex内存不足。请输入HELP MEMORY以供选择。”所以我在这里检查了你的答案:Out of memory error while using clusterdata in MATLAB并更新了你的代码,但现在的问题是这段代码中现在有一个索引错误mn = min(min(D(idx1,idx2)));我在问是否有解决方法这个错误?

使用的代码:

%function  single_linkage(depth,clrr)
X = randn(5000,3);
%X=XX;
% clr = clrr;
K=7;
clr = jet(K);
%// cluster into K=4
K = 7;
%CIDX = kmeans(X,K);


%// pairwise distances
SUBSET_SIZE = 1000;            %# subset size
ind = randperm(size(X,1));
data = X(ind(1:SUBSET_SIZE), :);
D = squareform(pdist(data));
subs = 1:size(D,1);
CIDX=kmeans(D, K,'distance','sqEuclidean', 'replicates',5);
centers = zeros(K, size(data,2));
for i=1:size(data,2)
    centers(:,i) = accumarray(CIDX, data(:,i), [], @mean);
end

%# calculate distance of each instance to all cluster centers
D = zeros(size(X,1), K);
for k=1:K
    D(:,k) = sum( bsxfun(@minus, X, centers(k,:)).^2, 2);
end
%D=squareform(D);
%# assign each instance to the closest cluster
[~,clustIDX] = min(D, [], 2);
%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs)); 
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
    %// index of points assigned to each of the two cluster
    idx1 = (clustIDX == cpairs(i,1));
    idx2 = (clustIDX == cpairs(i,2));

    %// shortest distance between the two clusters
    mn = min(min(D(idx1,idx2)));
    dists(i) = mn;

    %// corresponding pair of points with the minimum distance
    [r,c] = find(D(idx1,idx2)==mn);
    s1 = subs(idx1); s2 = subs(idx2);
    pairs(i,:) = [s1(r) s2(c)];
end

%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf;
cpairs(dist>thresh,:) = [];

%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
%clr = lines(K);
h = zeros(1,K);
for i=1:K
h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
    'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d'))   %'
view(3), axis vis3d, grid on

%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
    line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
        'Color','k', 'LineStyle','-', 'LineWidth',3, ...
        'Marker','o', 'MarkerSize',10);
end

1 个答案:

答案 0 :(得分:1)

您要求的声音类似于single-linkage clustering在每一步所做的事情;从底部开始,由最短距离分隔的簇被组合。

无论如何,下面是解决这个问题的蛮力方式。我确信有更高效的实现,但这个实现很容易实现。

%// data of 3D points
X = randn(5000,3);

%// cluster into K=4
K = 4;
CIDX = kmeans(X,K);

%// pairwise distances
D = squareform(pdist(X));
subs = 1:size(X,1);

%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs));
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
    %// index of points assigned to each of the two cluster
    idx1 = (CIDX == cpairs(i,1));
    idx2 = (CIDX == cpairs(i,2));

    %// shortest distance between the two clusters
    mn = min(min(D(idx1,idx2)));
    dists(i) = mn;

    %// corresponding pair of points with the minimum distance
    [r,c] = find(D(idx1,idx2)==mn);
    s1 = subs(idx1); s2 = subs(idx2);
    pairs(i,:) = [s1(r) s2(c)];
end

%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf;    %// use your threshold value instead
cpairs(dists>thresh,:) = [];

%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
clr = lines(K);
h = zeros(1,K);
for i=1:K
    h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
        'Color',clr(i,:), 'LineStyle','none', ...
        'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d'))   %'
view(3), axis vis3d, grid on

%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
    line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
        'Color','k', 'LineStyle','-', 'LineWidth',3, ...
        'Marker','o', 'MarkerSize',10);
end

3d points


请注意,在上面的示例中,数据是随机生成的,并不是很有趣,因此很难看到连接的最近点。

只是为了好玩,这是另一个结果,我只是将最小距离替换为一对簇之间的最大距离(类似于complete-linkage clustering),即使用:

mx = max(max(D(idx1,idx2)));

而不是之前的:

mn = min(min(D(idx1,idx2)));

max linkage

显示了我们如何连接每对簇之间的最远点。在我看来,这种可视化更有趣:)