这里有样本数据:
exclude.words <- c("zoznam","azet","dovera","joj","alza","telecom","google","post","sme")
main.data <- c("zoznam","registration","azet","azet.com","dovera","dna","joj","alza","telecom","google","post","sme")
如果单词相同(完全匹配),则可以使用此功能,但请参阅不会被排除的azet.com
!为此,我们可以使用agrepl()
。
main.data[!(main.data %in% exclude.words)]
那么如何将agrepl
与两个向量一起使用?
main.data[!agrepl(main.data, exclude.words)]
答案 0 :(得分:1)
main.data[!as.logical(rowSums(sapply(exclude.words, function(x) agrepl(x, main.data))))]
# [1] "registration" "dna"
# clarification
sapply(exclude.words, function(x) agrepl(x, main.data))
# zoznam azet dovera joj alza telecom google post sme
# [1,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [3,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [5,] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [7,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
# [8,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
# [9,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [10,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
# [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
答案 1 :(得分:1)
如评论所述,您可以使用:
main.data[!grepl(paste(exclude.words, collapse = "|"), main.data)]
排除main.data和exclude.words之间部分或完全匹配的任何字词。
paste(exclude.words, collapse = "|")
使用&#34; |&#34;创建单个字符串exclude.words之间的(逻辑OR),可以在grepl
中用作单个模式。因此,您不需要循环单个单词。
答案 2 :(得分:1)
您可以使用此函数式编程方法:
library(functional)
funcs = lapply(exclude.words, function(u) function(x) x[!grepl(u, x)])
Reduce(Compose, funcs)(main.data)
#[1] "registration" "dna"