我使用KMeans将我的项目分成一组集群,然后在每个集群内部,我想计算余弦相似度。 不幸的是,IndexedRowMatrix仅接受一个RDD,据我所知,不可能在另一个RDD中运行RDD
val df = ...
val kmeans = new KMeans().setK(10).setSeed(1L)
val model: KMeansModel = kmeans.fit(df)
val predictions = model.transform(df)
val clusters = predictions.groupByKey(row => row.getInt(0))
clusters.mapGroups {
case (key, cluster) =>
...
val indexedRowMatrix = new IndexedRowMatrix(....) // How to pass an RDD to build IndexedRowMatrix
indexedRowMatrix.toBlockMatrix().transpose.toIndexedRowMatrix().columnSimilarities()
}
有什么想法,按键分组后如何计算余弦相似度?