使用Gensim LDA模型对文本进行分类

时间:2020-04-13 22:47:13

标签: python-3.x gensim lda

作为参考,我已经研究了以下问题:

  1. Gensim LDA for text classification
  2. Python Gensim LDA Model show_topics funciton

我希望让我从Gensim训练的LDA模型将一个句子分类为该模型创建的主题之一。

的行很长
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

我知道gensim没有parse函数,但是如何做到这一点呢?这是我一直关注的文档,但是却一无所获:

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

先谢谢了。

编辑:更多文档-https://radimrehurek.com/gensim/models/ldamodel.html

1 个答案:

答案 0 :(得分:0)

让我正确解决您的问题: 您想在一些文档上训练LDA模型并检索7个主题。然后,您想将这些主题中的一个(或多个?)分类为新文档,这意味着您要推断未见过的新文档的主题分布。

如果是这样,gensim文档提供了答案。

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1