Question

我想从字符向量中删除单词。这就是我的方式：

library(tm)
words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")
removeWords(tolower(words), tolower(words_to_remove))

这真的很不错，但我希望“＃34; Intelligent＆＃34;按原样返回，意思是＆＃34;智能＆＃34;而不是＆＃34;智能。是否有可能仅在函数tolower中使用函数removeWords？

Answer 1

您可以在grepl处使用基本R方法：

words_to_remove = c("the", "This")
pattern <- paste0("\\b", words_to_remove, "\\b", collapse="|")
words = c("the", "The", "Intelligent", "this", "This")

res <- grepl(pattern, words, ignore.case=TRUE)
words[!res]

[1] "Intelligent"

Demo

我在这里使用的技巧是在调用paste时生成以下模式：

\bthe\b|\bThis\b

此模式可以在单个正则表达式评估中确定words中的任何字符串是否与要删除的匹配。

Answer 2

这是使用基础R的%in%函数的另一个选项：

words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")

words[!(tolower(words) %in% tolower(words_to_remove))]

对于“words_to_remove”列表中“words”的所有情况，

％in％返回TRUE。取相反的词来保留。

R tolower仅在功能范围内

2 个答案:

Demo