从Wordvec到KMeans算法维度问题生成的嵌入

时间:2019-05-25 18:55:52

标签: python scikit-learn nltk k-means word2vec

我正在尝试将单词2vec嵌入提供给Kmeasn算法。 我正在为Kmeans使用nltk库和scikitlearn库 但是我遇到了尺寸问题

我正在使用本教程作为参考 http://ai.intelligentonlinetools.com/ml/tag/k-means-clustering-example/

w2v_sentence_embeddding嵌入了Word2vec中的单词嵌入

from nltk.cluster import KMeansClusterer
import nltk
import numpy as np 

总簇

CLUSTER_NUMS=25

kmeans_clustering = KMeansClusterer(CLUSTER_NUMS, distance=nltk.cluster.util.euclidean_distance, repeats=25)
cluster_assigned = kmeans_clustering.cluster(w2v_sentence_embeddings, assign_clusters=True)
print (cluster_assigned)

for index, sentence in enumerate(sentences):    
    print (str(cluster_assigned[index]) + ":" + str(sentence))

ValueError: setting an array element with a sequence. with Scikit Learn
alueError: operands could not be broadcast together with shapes (0,) (100,)  with NLTK Kmeans

0 个答案:

没有答案