新语料库上的sklearn LatentDirichletAllocation主题推断

时间:2018-08-02 14:03:27

标签: python scikit-learn lda topic-modeling

我一直在使用sklearn.decomposition.LatentDirichletAllocation模块来探索文档语料库。经过一系列反复的训练和调整模型(即添加停用词和同义词,更改主题数)之后,我非常满意并熟悉提炼的主题。下一步,我想将训练有素的模型应用于新的语料库。

是否可以将拟合模型应用于一组新文档以确定主题分布。

我知道这可以在gensim库中进行,您可以在其中训练模型:

from gensim.test.utils import common_texts
from gensim.corpora.dictionary import Dictionary

# Create a corpus from a list of texts
common_dictionary = Dictionary(common_texts)
common_corpus = [common_dictionary.doc2bow(text) for text in common_texts]

lda = LdaModel(common_corpus, num_topics=10)

然后将经过训练的模型应用于新的语料库:

Topic_distribtutions = lda[unseen_doc]

来自:https://radimrehurek.com/gensim/models/ldamodel.html

如何使用LDA的scikit-learn应用程序做到这一点?

1 个答案:

答案 0 :(得分:3)

library(shiny) boxer_ui <- function(id) { ns <- NS(id) div( id, id = ns("killme"), style = "background-color:steelblue; font-size: xx-large; color: white") } boxer <- function(input, output, session, kill_switch) { ns <- session$ns observe({ req(kill_switch()) removeUI(paste0("#", ns("killme"))) }) } ui <- fluidPage(actionButton("new", "new"), actionButton("killall", "Kill All"), actionButton("add5", "Kill All & Add 5"), fluidRow(id = "content")) server <- function(input, output, session) { ids <- reactiveVal(0) kill_switch <- reactiveVal(FALSE) handler <- reactiveValues() add_new <- function() { kill_switch(FALSE) ids(ids() + 1) new_id <- paste0("id", ids()) insertUI("#content", "beforeEnd", boxer_ui(new_id)) handler[[new_id]] <- callModule(boxer, new_id, kill_switch) } observeEvent(input$new, { isolate({ add_new() })}) observeEvent(input$add5, { isolate({ kill_switch(TRUE) replicate(5, add_new()) })}) observeEvent(input$killall, kill_switch(TRUE)) } shinyApp(ui, server) 不这样做吗?

transform