我在R中有一个向量,在向量的最后一个索引处有一个单词列表。我需要从列表中删除一些单词
sentence <- "This is a sample sentence with words like or to be removed"
wordsToRemove <- c("The","an", "very", "of", "or","in","a","uses","that","be")
splitSent <- strsplit(sentence, " ")
我尝试了wordsToRemove %in% list(splitSent)
,但这全都是假的。还有其他方法可以解决吗?
注意:该句子是我向量中的一个元素,该元素具有int以及其他数据类型。我已经通过下面的链接 R: find vector in list of vectors
答案 0 :(得分:4)
我们可以在此处尝试使用sub
和正则表达式来覆盖您的所有条款。此答案的工作方式是搜索以下正则表达式,然后将其替换为空字符串以有效地将其删除:
\s*\b(The|an|very|of|or|in|a|uses|that|be)\b
这将与您的任何条件以及任何数量的前导空格匹配。
sentence <- "This is a sample sentence with words like or to be removed"
sentence
wordsToRemove <- c("The","an", "very", "of", "or","in","a","uses","that","be")
regex <- paste0("\\s*\\b(", paste(wordsToRemove, collapse="|"), ")\\b")
output <- sub("^\\s+", "", gsub(regex, "", sentence, ignore.case=TRUE))
output
[1] "This is a sample sentence with words like or to be removed"
[1] "This is sample sentence with words like to removed"
请注意,实际上我在上面的sub
处附加了一个调用,因为我们必须修剪该模式可能会错过的任何初始前导空白。