Question

我在matlab上实现k-means算法而不使用k-means内置函数，停止标准是新的质心不会因新的迭代而改变，但是我无法在matlab中实现它，任何人都可以帮助？

由于

Answer 1

将no change设置为停止标准是个坏主意。您应该使用0更改条件的主要原因有几个

即使是表现良好的功能，0变化和非常小的变化（也许是1e-5）之间的差异可能是1000+次迭代，所以你在浪费时间试图让它们完全相同。特别是因为计算机通常保留的数字远远超过我们感兴趣的数字。如果您只需要1位数的精度，为什么要等待计算机在1e-31内找到答案？
计算机到处都有浮点错误。尝试做一些容易可逆的矩阵运算，如a = rand(3,3); b = a*a*inv(a); a-b理论上这应该是0，但你会发现它不是。因此，仅这些错误可能会阻止您的程序停止
抖动。假设我们有1d k表示3个数字的问题，我们想将它们分成2组。一次迭代分组可以是a,b vs c。下一个迭代可能是a vs b, c下一个可能是a,b vs c下一个......这当然是一个简化的例子，但是可能存在一些数据点可以在集群之间抖动的情况，你将最终得到一个永无止境的算法。由于重新分配了这几个点，因此更改永远不会为0

解决方案是使用增量阈值。基本上你从前一个中减去当前值，如果它们小于阈值你就完成了。这本身就很强大，但与任何循环一样，您需要一个备份逃生计划。那就是设置一个max_iterations变量。看一下klaans的matlabs文档，即使它们有一个MaxIter变量（默认值为100），所以即使你的kmeans没有收敛，至少它也不会无休止地运行。像这样的东西可能会起作用

%problem specific
max_iter = 100;

%choose a small number appropriate to your problem
thresh = 1e-3;

%ensures it runs the first time
delta_mu = thresh + 1;
num_iter = 0;

%do your kmeans in the loop
while (delta_mu > thresh && num_iter < max_iter)
   %save these right away
   old_mu = curr_mu;

   %calculate new means and variances, this is the standard kmeans iteration
   %then store the values in a variable called curr_mu
   curr_mu = newly_calculate_values;

   %use the two norm to find the delta as a single number. no matter what
   %the original dimensionality of mu was. If old_mu -new_mu was
   % 0 the norm is still 0. so it behaves well as a distance measure.
   delta_mu = norm(old_mu - curr_mu,2);

   num_ter = num_iter + 1;
end

修改

如果您不知道2范数基本上是欧氏距离

K-means在Matlab中停止标准？

1 个答案: