R-从句子中删除单词

时间:2019-01-22 20:26:47

标签: r string

我在R中有一个向量,在向量的最后一个索引处有一个单词列表。我需要从列表中删除一些单词

sentence <- "This is a sample sentence with words like or to be removed"
wordsToRemove <- c("The","an", "very", "of", "or","in","a","uses","that","be")

splitSent <- strsplit(sentence, " ")

我尝试了wordsToRemove %in% list(splitSent),但这全都是假的。还有其他方法可以解决吗?

注意:该句子是我向量中的一个元素,该元素具有int以及其他数据类型。我已经通过下面的链接 R: find vector in list of vectors

1 个答案:

答案 0 :(得分:4)

我们可以在此处尝试使用sub和正则表达式来覆盖您的所有条款。此答案的工作方式是搜索以下正则表达式,然后将其替换为空字符串以有效地将其删除:

\s*\b(The|an|very|of|or|in|a|uses|that|be)\b

这将与您的任何条件以及任何数量的前导空格匹配。

sentence <- "This is a sample sentence with words like or to be removed"
sentence
wordsToRemove <- c("The","an", "very", "of", "or","in","a","uses","that","be")

regex <- paste0("\\s*\\b(", paste(wordsToRemove, collapse="|"), ")\\b")
output <- sub("^\\s+", "", gsub(regex, "", sentence, ignore.case=TRUE))
output

[1] "This is a sample sentence with words like or to be removed"
[1] "This is sample sentence with words like to removed"

请注意,实际上我在上面的sub处附加了一个调用,因为我们必须修剪该模式可能会错过的任何初始前导空白。