Question

作为参考，我已经研究了以下问题：

我希望让我从Gensim训练的LDA模型将一个句子分类为该模型创建的主题之一。

的行很长

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

我知道gensim没有parse函数，但是如何做到这一点呢？这是我一直关注的文档，但是却一无所获：

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

先谢谢了。

编辑：更多文档-https://radimrehurek.com/gensim/models/ldamodel.html

Answer 1

让我正确解决您的问题：您想在一些文档上训练LDA模型并检索7个主题。然后，您想将这些主题中的一个（或多个？）分类为新文档，这意味着您要推断未见过的新文档的主题分布。

如果是这样，gensim文档提供了答案。

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1

使用Gensim LDA模型对文本进行分类

1 个答案: