我正在尝试找到一种有效的方法,用删除列表中的单词删除输入列表中一组单词的所有实例。
vectorOfWordsToRemove <- c('cat', 'monkey', 'wolf', 'mouses')
vectorOfPhrases <- c('the cat and the monkey walked around the block', 'the wolf and the mouses ate lunch with the monkey', 'this should remain unmodified')
remove_strings <- function(a, b) { stringr::str_replace_all(a,b, '')}
remove_strings(vectorOfPhrases, vectorOfWordsToRemove)
我希望输出的是
vectorOfPhrases <- c('the and the walked around the block', 'the and the ate lunch with the', 'this should remain unmodified')
也就是说,vector-vectorOfWordsToRemove中所有单词的每个实例都应该在vectorOfPhrases中删除。
我可以使用for循环执行此操作,但它非常慢,似乎应该有一种矢量化方式来有效地执行此操作。
由于
答案 0 :(得分:1)
首先,我将一个空字符串向量替换为:
vectorOfNothing <- rep('', 4)
然后使用qdap库用替换向量替换模式向量:
library(qdap)
vectorOfPhrases <- qdap::mgsub(vectorOfWordsToRemove,
vectorOfNothing,
vectorOfPhrases)
> vectorOfPhrases
[1] "the and the walked around the block" "the and the ate lunch with the"
[3] "this should remain unmodified"
答案 1 :(得分:1)
您可以使用gsubfn()
:
library(gsubfn)
replaceStrings <- as.list(rep("", 4))
newPhrases <- gsubfn("\\S+", setNames(replaceStrings, vectorOfWordsToRemove), vectorOfPhrases)
> newPhrases
[1] "the and the walked around the block" "the and the ate lunch with the"
[3] "this should remain unmodified"