在形成Gensim LDA模型时,我使用以下命令获取了数据字典。
from gensim.corpora import Dictionary
dictionary1 = Dictionary(docs)
dictionary1.filter_extremes(no_below=10, no_above=0.75, keep_n = 1000)
在这1000个最常用的标记中,我手动删除了500个标记,以便其余标记与我要生成的主题直接相关。 我如何才能从这个新的字典类型的字典中进一步形成语料库文档。我应该使用哪种形式来训练我的LDA模型?
答案 0 :(得分:0)
您可以按照以下方式训练LDA模型:
## Construct corpus and vectorize
corpus = [dictionary1.doc2bow(content) for content in docs]
## train LDA model with 5 topics over 100 passes
## number of topics is chosen randomly in this case
## higher number of passes leads to better results but increases complexity
lda_model = gensim.models.ldamodel.LdaModel(corpus, num_topics=5, id2word = dictionary1, passes=100)
print(lda_model.print_topics(num_topics=5, num_words=3))