我正在做主题建模。我想要过滤词,因为所有图(5)都相同。我不知道 该代码。
我认为对我不需要但不起作用的词使用filter()。
这些是我获得代币的消息。
pagina_9_19_adn <- c("https://www.adnradio.cl/search/estallido%20social/page/9/",
"https://www.adnradio.cl/search/estallido%20social/page/10/",
"https://www.adnradio.cl/search/estallido%20social/page/11/",
"https://www.adnradio.cl/search/estallido%20social/page/12/",
"https://www.adnradio.cl/search/estallido%20social/page/13/",
"https://www.adnradio.cl/search/estallido%20social/page/14/",
"https://www.adnradio.cl/search/estallido%20social/page/15/",
"https://www.adnradio.cl/search/estallido%20social/page/16/",
"https://www.adnradio.cl/search/estallido%20social/page/17/",
"https://www.adnradio.cl/search/estallido%20social/page/18/",
"https://www.adnradio.cl/search/estallido%20social/page/19/")
tablas_all_pages_adn <- lapply(pagina_9_19_adn, paginaadn)
tablas_all_pages_adn <- bind_rows(tablas_all_pages_adn)
这是获取新闻令牌的代码。
tablas_all_pages_adn <- tablas_all_pages_adn %>%
unnest_tokens(word, texto) %>%
anti_join(get_stopwords())%>%
filter(!word %in% stopwords::stopwords("es", "stopwords-iso"))
tokens_adn <- tablas_all_pages_adn %>%
count(titulo, word, sort = T) %>%
cast_dfm(titulo, word, n)
topic_model <- stm(tokens_adn, K = 5,
verbose = FALSE, init.type = "Spectral")
这是我在图表中的最终代码,但是就像我说的,它们基本上都是相同的词,所以我想删除重复的词
topic_model_adn %>%
group_by(topic) %>%
top_n(7, beta) %>%
ungroup() %>%
ggplot(aes(term, beta, fill = as.factor(topic))) +
geom_col(alpha = 0.5, show.legend = FALSE) +
facet_wrap(~ topic, scales = "free_y") +
coord_flip() +
labs(x = NULL, y = expression(beta),
title = "Palabras por tema ADN")
谢谢!希望有人可以帮助我。