从句子中删除单词

时间:2019-10-09 08:53:07

标签: r

我有一个包含文本的数据框,我试图从存储在矢量中的文本中删除某些单词。请帮助我实现这一目标!

stopwords <- c("today","hot","outside","so","its")
df <- data.frame(a = c("a1", "a2", "a3"), text = c("today the weather looks hot", "its so rainy outside", "today its sunny"))

预期输出:

   a                        text          new_text
1 a1 Today the weather looks hot the weather looks
2 a2        its so rainy outside             rainy
3 a3             today its sunny             sunny

1 个答案:

答案 0 :(得分:1)

将所有stopwords粘贴在一起,然后使用gsub删除它们。

df$new_text <- trimws(gsub(paste0(stopwords, collapse = "|"), "", df$text))
df
#   a                        text          new_text
#1 a1 today the weather looks hot the weather looks
#2 a2        its so rainy outside             rainy
#3 a3             today its sunny             sunny

或与str_remove_all

stringr::str_remove_all(df$text, paste0(stopwords, collapse = "|"))

为了更加安全,在每个stopwords周围添加单词边界,以便不替换"so""something"中的"some"

df$new_text <- trimws(gsub(paste0("\\b", stopwords, "\\b",
               collapse = "|"), "", df$text))