Question

我能够使用gensim从LDA模型中提取主题。当我打印主题时，默认情况下显示的主题数为10个字。我想在一个主题中显示15个单词。我试图对其进行更改，但仍然每个主题获得10个单词。如何更改此默认行为？

代码如下：

for n, topic in model.show_topics(num_topics=-1, num_words=15,formatted=False):
                topic = [word for word, _ in topic]
                cm = CoherenceModel(topics=[topic], texts=documents, dictionary=dictionary, window_size=10)
                coherence_values[n] = cm.get_coherence()
            top_topics = sorted(coherence_values.items(), key=operator.itemgetter(1), reverse=True)
            result.append((model, top_topics))

并打印主题：

pprint([lm.show_topic(topicid) for topicid, c_v in top_topics[:8]])

Answer 1

我认为问题出在show_topic函数中。您正在为该主题找到更多单词，但没有显示它们，因为show_topic有一个可选变量，topn用于检索最重要的单词。默认值为10，因此将print语句中的代码更改为

pprint([lm.show_topic(topicid, topn=15) for topicid, c_v in top_topics[:8]])

它应该全部显示。

如何更改LDA中的默认number_words

1 个答案: