这个问题与主题建模库Gensim for Python中的两个不同对象有关。 Gensim具有为“ LdaModel”类型的对象定义的函数“ diff()”。可以用来比较两个LDA模型之间的主题模型,如下所示:
lda_model_1 = LdaModel() # some initialization, training etc omitted
lda_model_2 = LdaModel() # some initialization, training etc omitted
lda_model_1.diff(lda_model_2)
其中diff()具有以下文档字符串:
Signature: model_lda.diff(other, distance='kullback_leibler', num_words=100, n_ann_terms=10, diagonal=False, annotation=True, normed=True)
Docstring:
Calculate the difference in topic distributions between two models: `self` and `other`.
现在,我训练了类型为“ LdaModel”的LDA模型和类型为“ AuthorTopicModel”的Author Topic模型。后者继承自“ LdaModel”。
现在,运行以下命令会产生错误:
lda_model = LdaModel() # excluding some init code here
at_model = AuthorTopicModel # excluding some init code here
at_model.diff(lda_model)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-74-cce096e96139> in <module>()
----> 1 at_model.diff(lda_model)
/home/xxx/lib/python3.5/site-packages/gensim/models/ldamodel.py in diff(self, other, distance, num_words, n_ann_terms, diagonal, annotation, normed)
1428
1429 if not isinstance(other, self.__class__):
-> 1430 raise ValueError("The parameter `other` must be of type `{}`".format(self.__name__))
1431
1432 distance_func = distances[distance]
ValueError: The parameter `other` must be of type `LdaModel`
...虽然确实可以进行其他操作:
lda_model.diff(at_model)
现在,我的问题是Gensim是否确实为LDA和作者主题模型之间的“差异”产生了有意义的输出?我不了解这些模型的内部表示形式,我很想知道上面的代码是否导致有意义的差异,从而为两种模型类型的主题之间提供了有效的分配。