Question

我只是最近才开始在工作中使用LDA，但是，每次使用LDA（在R中）时，返回的主题在最高术语上都是相同的。本质上，只有一个潜在的主题可以与我的结果区分开。在不同数据集，不同主题和不同来源的情况下，始终会出现此问题。 N.B所有数据集都在10,000行中-可能太小了吗？

我正在使用此代码；

data_DL_dtm <- NPS_Clientidentified %>%
  filter(!is.na(Comment)) %>%
  unnest_tokens(word, Comment) %>%
  anti_join(stop_words) %>%
  anti_join(custom_stop_words) %>%
  count(`Full Name`, word) %>%
  cast_dtm(`Full Name`, word, n)
nrow(data_DL_tidy)
DL_lda <- LDA(data_DL_dtm, k = 3, control = list(seed = 1234))

DL_topics <- tidy(DL_lda, matrix="beta")
DL_top_terms <- DL_topics %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  arrange(topic, -beta)

DL_top_terms %>%
  mutate(term = reorder(term, beta)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

哪个给我这个：

https://i.imgur.com/zE2SeFX.png

我不希望该系统出现在每个主题的首位。

我在这里错过了什么吗？或者有什么方法可以改善我的LDA模型？

Answer 1

看起来您的主题融合了，这可能有三个原因：

您可能迭代了很多次
您在语料库中传递了太多次

我希望这会有所帮助：）

LDA不能区分多个主题-怎么了？

1 个答案: