Question

我想创建一个包含原始文档要点的摘要。。为此，我使用通用句子编码器（https://tfhub.dev/google/universal-sentence-encoder/2）进行句子嵌入。之后，我想在向量上应用聚类。对于每个聚类，我想选择向量表示最接近聚类中心的句子。但是我不知道该怎么办。。您有什么建议或图书馆吗？

我已经尝试使用库sklearn：

import numpy as np
from sklearn.cluster import KMeans

n_clusters = np.ceil(len(encoded)**0.5)
kmeans = KMeans(n_clusters=n_clusters)
kmeans = kmeans.fit(encoded)

但是我收到一条错误消息：'numpy.float64'对象不能解释为整数'

Answer 1

您需要的一切都已经在sklearn中。

您只需要自己开始编写代码，不仅要查找复制和粘贴示例。

Answer 2

问题是在此行引起的：

checkBox1.addValueChangeListener(event ->
                checkBox1.setValue(
(Boolean) event.getProperty().getValue()
                                   ));

n_clusters = np.ceil(len(encoded)**0.5)希望收到kmeans作为簇数，因此只需添加：

integer

如何对句子嵌入应用聚类？

2 个答案: