如何获取LDA中属于Topic的重要文档(例如5)

时间:2019-03-25 22:29:14

标签: lda topic-modeling

基本上,我已经建立了LDA模型,该模型为我拥有的调查数据提供了5个主题。我想从此调查(文档)中获取属于最高概率得分(或相似性)的主题的示例

我确实尝试了以下代码,但这并没有给我有关该主题与文档的相似程度的任何顺序或可能性:

LDA = gensim.models.ldamodel.LdaModel

# Build LDA model
lda_model = LDA(corpus=doc_term_matrix, id2word=dictionary, num_topics=5, 
random_state=100,
                chunksize=1000, passes=50,minimum_probability=0)
lda_corpus = lda_model[doc_term_matrix]
from itertools import chain
scores = list(chain(*[[score for topic_id,score in topic] \
                      for topic in [doc for doc in lda_corpus]]))
threshold = sum(scores)/len(scores)
threshold = threshold + 0.3
print(threshold)
cluster1 = [j for i,j in zip(lda_corpus,input_df['Serv_DlrRec_Verb']) if i[0][1] > threshold]
cluster2 = [j for i,j in zip(lda_corpus,input_df['Serv_DlrRec_Verb']) if i[1][1] > threshold]
cluster3 = [j for i,j in zip(lda_corpus,input_df['Serv_DlrRec_Verb']) if i[2][1] > threshold]
cluster4 = [j for i,j in zip(lda_corpus,input_df['Serv_DlrRec_Verb']) if i[3][1] > threshold]
cluster5 = [j for i,j in zip(lda_corpus,input_df['Serv_DlrRec_Verb']) if i[4][1] > threshold]

你们能帮忙吗?

0 个答案:

没有答案