Question

大众有没有办法比较LDA的模型拟合？软件的累进损失是否有意义用于此目的？

Answer 1

运行 vw -h --lda 1 时，帮助提供以下参数。 metrics 参数默认关闭。它用于计算实现 here 的主题一致性。尝试通过传递 --metrics 1

来启用此功能

Latent Dirichlet Allocation:
  --lda arg                             Run lda with <int> topics

  --lda_alpha arg (=0.100000001)        Prior on sparsity of per-document topic
                                        weights
  --lda_rho arg (=0.100000001)          Prior on sparsity of topic 
                                        distributions
  --lda_D arg (=10000)                  Number of documents
  --lda_epsilon arg (=0.00100000005)    Loop convergence threshold
  --minibatch arg (=1)                  Minibatch size, for LDA
  --math-mode arg (=0)                  Math mode: simd, accuracy, fast-approx
  --metrics arg (=0)                    Compute metrics

或者直接跳入source code of vw utility。

可以找到展示大多数参数的有用演示here。

Python：如果您使用的是 gensim

（你用 python 标记了这个问题）

如果您使用的是 gensim (< 4.0.0) 中提供的 python 包装器，您可以简单地使用 Gensim，就像您在使用 vwmodel2ldamodel 或直接使用 log_perplexity 或其他 coherence measures。

可以找到关于如何比较多个 LDA 模型的好教程 here。

Answer 2

在R统计软件包中，您可以使用此类程序诊断模型

How to compute the log-likelihood of the LDA model in vowpal wabbit

我也在大众那里问过这个机会

Vowpal Wabbit LDA：模型选择

2 个答案:

Python：如果您使用的是 gensim