Question

我有kmeans集群，它必须将数据划分为2个集群。这个过程继续循环，直到达到一个条件。所以，最后我可能会得到20个集群。我这样做是因为我不想分配特定数量的集群。所以它必须继续分为2。

我想知道如何在Matlab中做到这一点？我正在使用循环，但问题是在合并数据后我必须更改群集号。是否有任何功能可以单独执行，而不是分配新的群集号？

以下示例可以是方法之一。可能还有其他方法。没关系，只要它给出了聚类组合的结果。例如：* first loop：1 2 2 1 2 1 1

第二个循环:(它将采用第一个聚类并将其聚类为2个集合，然后将它与之前的结果相结合）

[1 1 1 1]在群集进入2集后=＆gt; [1 1 2 1] =＆gt;与之前的循环结合[1 1 3 1]（它选择3，因为我们已经有了集群2）

它将再次占用（子集群的）第一个集群：

在cluster =＆gt;之后

[1 1 1] [1 1 2] =＆gt;与之前的循环相结合[1 1 3 4] 这是一个例子：

我的代码：

[IDX,C,SUMD] = SpectralClustering(G, k); % k is two
.
.
.
if Wav > w % Wav is average weight of cluster

            Gi = subgraph(G, IDX==1); % IDX is cluster number 
            Ctemp = union(Ctemp, SpectralClustering(Gi, k)); % k is 2
 else
            Ctemp = union(Ctemp, IDX);
 end

C = Ctemp;

Answer 1

如我的评论所述，如果群集标签来自父群集，则它们将是唯一的：

function [clusters] =  clusterExample(data, parentCluster)

% On each level, cluster data into two clusters based on value relative to
% quantiles (AKA the median, when k = 2)

% Stop clustering if the ratio of the standard deviation of the cluster
% to the mean of the cluster is  <= .1 
% This is an arbitary stopping rule for this example
k = 2;

cutOffPoint = quantile(data, (1 / k));

clusters = nan(1, length(data));
clusters(data <= cutOffPoint) = (parentCluster * 10) + 1;
clusters(data > cutOffPoint) =  (parentCluster * 10) + 2;
clusterLabels = unique(clusters);
for g = clusterLabels
   clusterIdx = clusters == g;
   clusterMean = mean(data(clusterIdx));
   clusterSD = std(data(clusterIdx));
   if (clusterSD / clusterMean) > .1
       clusters(clusterIdx) = clusterExample(data(clusterIdx), g);
   end
end

使用中：

data = rand(1,100);
startCluster = 0;
clusters = clusterExample(data, startCluster);

生成的聚类每个都具有在聚类均值的10％范围内的SD（本例中使用的任意停止规则）。

结合循环中的k均值聚类的结果

1 个答案: