我该如何过滤单词

时间:2020-06-28 03:31:12

标签: r filter

我正在做主题建模。我想要过滤词,因为所有图(5)都相同。我不知道 该代码。

我认为对我不需要但不起作用的词使用filter()。

这些是我获得代币的消息。

pagina_9_19_adn <- c("https://www.adnradio.cl/search/estallido%20social/page/9/",
                 "https://www.adnradio.cl/search/estallido%20social/page/10/",
                 "https://www.adnradio.cl/search/estallido%20social/page/11/",
                 "https://www.adnradio.cl/search/estallido%20social/page/12/",
                 "https://www.adnradio.cl/search/estallido%20social/page/13/",
                 "https://www.adnradio.cl/search/estallido%20social/page/14/",
                 "https://www.adnradio.cl/search/estallido%20social/page/15/",
                 "https://www.adnradio.cl/search/estallido%20social/page/16/",
                 "https://www.adnradio.cl/search/estallido%20social/page/17/",
                 "https://www.adnradio.cl/search/estallido%20social/page/18/",
                 "https://www.adnradio.cl/search/estallido%20social/page/19/")

tablas_all_pages_adn <- lapply(pagina_9_19_adn, paginaadn)
tablas_all_pages_adn <- bind_rows(tablas_all_pages_adn)

这是获取新闻令牌的代码。

tablas_all_pages_adn <- tablas_all_pages_adn %>%
unnest_tokens(word, texto) %>%
anti_join(get_stopwords())%>%
filter(!word %in% stopwords::stopwords("es", "stopwords-iso"))

tokens_adn <- tablas_all_pages_adn %>% 
count(titulo, word, sort = T) %>%
cast_dfm(titulo, word, n)

topic_model <- stm(tokens_adn, K = 5, 
               verbose = FALSE, init.type = "Spectral")

这是我在图表中的最终代码,但是就像我说的,它们基本上都是相同的词,所以我想删除重复的词

topic_model_adn %>%
group_by(topic) %>%
top_n(7, beta) %>%
ungroup() %>%
ggplot(aes(term, beta, fill = as.factor(topic))) +
geom_col(alpha = 0.5, show.legend = FALSE) +
facet_wrap(~ topic, scales = "free_y") +
coord_flip() +
labs(x = NULL, y = expression(beta),
   title = "Palabras por tema ADN")

谢谢!希望有人可以帮助我。

0 个答案:

没有答案