在K均值聚类算法(sklearn)中,如何将欧几里德距离覆盖到某个距离

时间:2018-12-03 09:23:53

标签: machine-learning scikit-learn k-means euclidean-distance wmd

我有一些文档,我只想对相关文档进行分组。目前,我正在使用Google的新闻矢量文件(GoogleNews-vectors-negative300.bin),并使用此矢量文件获取矢量,并使用WMD(词移动距离)算法来获取两个文档之间的距离。现在我想将其与K-means聚类集成在一起。基本上我想覆盖KMeans中的距离计算功能。我怎样才能做到这一点?任何建议都是最欢迎的。提前致谢。

1 个答案:

答案 0 :(得分:2)

Although it is possible in theory implement k-means with other distance measures, it is not advised - your algorithm could stop converging. More detailed discussion can be found e.g. on StackExchange. That's why scikit-learn does not feature other distance metrics.

I'd suggest using e.g. hierarchical clustering, where you can plug in arbitrary distance function.