我正在Gensim
中进行主题建模
我成功找到了document_id和sameity_percentage。
这就是我要尝试的。
documents = ["Say to other what you feel",
"Speak truth from your heart and tell people",
"what this book say and tell about lying"]
texts = # remove common words and tokenize
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus_tfidf]
index = similarities.MatrixSimilarity(lsi[corpus])
doc = "Always tell people what in your heart"
vec_bow = dictionary.doc2bow(doc.lower().split())
vec_lsi = lsi[vec_bow]
sims = index[vec_lsi]
输出
[(0, 0.74419993), (1, 0.99159265), (2, 0.35600105)]
| |
| |
| |
index similarity percentage
number
in
documents
array
我想要类似下面的结果
我想要这个
[(myid_123, 0.74419993), (abc_1, 0.99159265), (id_3, 0.35600105)]
| |
| |
| |
string similarity percentage
id
in
documents
array
我尝试了类似的方法,但是没有用
documents = {"myid_123": "Say to other what you feel",
"abc_1": "Speak truth from your heart and tell people",
"id_3": "what this book say and tell about lying"}
如何为文档指定我的ID。在Gensim中可能吗。 如果是的话。你有什么例子吗?