我有一些文档,我只想对相关文档进行分组。目前,我正在使用Google的新闻矢量文件(GoogleNews-vectors-negative300.bin),并使用此矢量文件获取矢量,并使用WMD(词移动距离)算法来获取两个文档之间的距离。现在我想将其与K-means聚类集成在一起。基本上我想覆盖KMeans中的距离计算功能。我怎样才能做到这一点?任何建议都是最欢迎的。提前致谢。
答案 0 :(得分:2)
Although it is possible in theory implement k-means with other distance measures, it is not advised - your algorithm could stop converging. More detailed discussion can be found e.g. on StackExchange. That's why scikit-learn does not feature other distance metrics.
I'd suggest using e.g. hierarchical clustering, where you can plug in arbitrary distance function.