从向量2中删除向量1中找到的字符串

时间:2016-01-19 09:26:33

标签: r gsub sapply

我有这两个载体:

sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")

我正在尝试删除sample2中找到的sample1字符串。我得到的最接近的是使用sapply进行迭代,这给了我:

 sapply(sample1, function(i)gsub(i, "", sample2))

     .aaa                     .aarp                    .abb                 .abbott                  .abogado          
[1,] "try1.aarp"              "try1"                   "try1.aarp"          "try1.aarp"              "try1.aarp"       
[2,] "www.tryagain"           "www.tryagain.aaa"       "www.tryagain.aaa"   "www.tryagain.aaa"       "www.tryagain.aaa"
[3,] "255.255.255.255"        "255.255.255.255"        "255.255.255.255"    "255.255.255.255"        "255.255.255.255" 
[4,] "onemoretry.abb.abogado" "onemoretry.abb.abogado" "onemoretry.abogado" "onemoretry.abb.abogado" "onemoretry.abb"  

当然预期的输出应该是

[1] "www.tryagain"    "try1"            "onemoretry"      "255.255.255.255"

感谢您的时间。

2 个答案:

答案 0 :(得分:4)

试试这个,

sample1 <- c(".aaa", ".aarp", ".abb", ".abbott", ".abogado")
sample2 <- c("try1.aarp", "www.tryagain.aaa", "255.255.255.255", "onemoretry.abb.abogado")
paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")
# [1] "(\\.aaa|\\.aarp|\\.abb|\\.abbott|\\.abogado)\\b"
gsub(paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b"), "", sample2)
# [1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry" 

说明:

  • sub("\\.", "\\\\.", sample1)逃脱了所有的点。由于点是正则表达式中的特殊字符。

  • paste(sub("\\.", "\\\\.", sample1), collapse="|")将所有元素与|组合为分隔符。

  • paste0("(",paste(sub("\\.", "\\\\.", sample1), collapse="|"),")\\b")创建一个正则表达式,就像捕获组中存在的所有元素后跟一个单词边界一样。 \\b在这里非常需要。这样它就可以进行精确的单词匹配。

答案 1 :(得分:1)

我们可以将paste个'sample1'元素放在一起,将其用作pattern中的gsub参数,将其替换为''

gsub(paste(sample1, collapse='|'), '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"  

或使用mgsub

library(qdap)
mgsub(sample1, '', sample2)
#[1] "try1"            "www.tryagain"    "255.255.255.255" "onemoretry"