Question

我最近深入研究LDA，看起来很合理，但我只剩下几个无法找到答案的问题。

对于Lda，我们首先将korpus表示为向量

           word1 , word2 , word3 , wordN
document1   n       n        n       n
document2   n       n        n       n
documentN   n       n        n       n

这告诉我们，来自词汇j的单词j出现在文档i中的次数。

第一个问题）

我们是否从所有文档中的单词中随机创建词汇V，还是我们选择它以使V中的每个单词至少出现在每个文档中一次？

接下来我们为每个文档创建矩阵

        topic1  topic2 topicN
 word1   n         n     n
 word2   n         n     n
 word3   n         n     n
 word4   n         n     n

我们选择要在文档中表示的主题，并随机将单词分配给文档（如果单词属于主题，则wordi x topij = 1，否则为0）

接下来，对于每个单词，我们使用公式计算他们的新主题

P = P1 * P2

哪里

P1 = Probability( topic T | document d )
P2 = Probability( word W | topic T )

现在新概率K以概率P分配给单词W.

第二个问题）

我们选择哪个主题作为T以及我们为单词W指定概率P的主题是什么？我没能找到答案。

感谢您的回答

LDA和选择主题

0 个答案: