Question

是否有k-Means clustering算法的在线版本？

在线我的意思是每个数据点都是串行处理的，一旦进入系统就会一次处理，从而节省了实时使用时的计算时间。

我写了一篇很好的成绩，但我真的更喜欢用“标准化”来指代，因为它将用于我的硕士论文。

此外，是否有人对其他在线群集算法有建议？（lmgtfy失败;））

Answer 1

是的。谷歌未能找到它，因为它通常被称为“顺序k-means”。

您可以在this section of some Princeton CS class notes Richard Duda中找到顺序K-means的两个伪代码实现。我已经复制了以下两种实现中的一种：

Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
    Acquire the next example, x
    If mi is closest to x
        Increment ni
        Replace mi by mi + (1/ni)*( x - mi)
    end_if
end_until

关于它的美妙之处在于，您只需要记住每个群集的平均值以及分配给群集的数据点数量。一旦更新了这两个变量，就可以丢弃数据点。

我不确定你能在哪里找到它的引用。我会开始查看Duda的经典文本Pattern Classification and Scene Analysis或更新版Pattern Classification。如果它不在那里，你可以试试Chris Bishop的最新书或Daphne Koller和Nir Friedman最近的文章。

在线k-means聚类

1 个答案: