Question

我是典型的常规日常R用户。在R中，lda包中非常有用的lda.collapsed.gibbs.sampler使用折叠的Gibbs采样器来拟合潜在的Dirichlet分配（LDA）模型，并使用最后一次迭代时的状态返回潜在参数的点估计。吉布斯抽样。

此函数还有一个很好的参数compute.log.likelihood，当设置为TRUE时，将导致采样器计算日志每次扫过之后单词的可能性（在一个恒定因子内）变量。 这对于评估收敛性和比较不同的LDA模型（针对不同主题数量计算）非常有用。

如果vowpal_wabbit's LDA模型中有这样的选项，我感兴趣吗？

Answer 1

运行 vw -h --lda 1 时，帮助提供以下参数。 metrics 参数默认关闭。它用于计算实现 here 的主题一致性。尝试通过传递 --metrics 1

来启用此功能

Latent Dirichlet Allocation:
  --lda arg                             Run lda with <int> topics

  --lda_alpha arg (=0.100000001)        Prior on sparsity of per-document topic
                                        weights
  --lda_rho arg (=0.100000001)          Prior on sparsity of topic 
                                        distributions
  --lda_D arg (=10000)                  Number of documents
  --lda_epsilon arg (=0.00100000005)    Loop convergence threshold
  --minibatch arg (=1)                  Minibatch size, for LDA
  --math-mode arg (=0)                  Math mode: simd, accuracy, fast-approx
  --metrics arg (=0)                    Compute metrics

或者直接跳入source code of vw utility。

可以找到展示大多数参数的有用演示here。

如何计算vowpal wabbit中LDA模型的对数似然

1 个答案: