使用向量中的元素使用R grep正则表达式(FOLLOW UP)

时间:2018-03-14 03:33:13

标签: r regex

关注this question,我有另一个例子,我无法使用接受的答案。

同样,我想在<div class="container-fluid"> <div class="row"> <div id="footer"> <div class="col-xs-12 text-center"> <ul class="nav navbar-nav nav-bottom"> <li><a id="home" href="#" onclick="return false">HOME</a></li>| <li><a id="services" href="#" onclick="return false">SERVICES</a></li>| <li><a id="contact" href="#" onclick="return false">CONTACT</a></li> </ul> <img id="secondary-logo" src="img/company_logoalt.png" class="img-responsive pull-right" alt="alternate logo"> <span>© Company 2018</span> </div> </div> </div> </div> 向量中找到每个确切的group元素......

lab

我尝试以下但不起作用......

labs <- c("Beijing -- T0 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC")
labs
groups <- c("BC-89 + CN", "BC-89 + CN with 2% DD + 1.6% ZC", "BC-89 with 2% Puricare + 5% Merquat + CN")
groups

任何帮助?

3 个答案:

答案 0 :(得分:1)

尝试

lapply(groups, function(g)
  grep(gsub("\\+", "\\\\+", paste0(g, "$")), labs, value = TRUE))
# [[1]]
# [1] "Beijing -- T0 -- BC-89 + CN"     
# [2] "Beijing -- T24 -- BC-89 + CN"    
# [3] "Beijing -- T0 -- BC-89 + CN"     
# [4] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [5] "Beijing -- T0 -- BC-89 + CN"     
# [6] "Beijing -- T0 -- BC-89 + CN"     
# [7] "Beijing -- T24 -- BC-89 + CN"    
# [8] "Beijing -- T24 -- BC-89 + CN"    
# [9] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [10] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [11] "Zhangjiakou -- T24 -- BC-89 + CN"
# [12] "Zhangjiakou -- T24 -- BC-89 + CN"
# 
# [[2]]
# [1] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [2] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
# [3] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [4] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
# [5] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [6] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
# [7] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
# [8] "Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# 
# [[3]]
# [1] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
# [2] "Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN"   
# [3] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
# [4] "Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"

您的方法存在的问题是,例如,groups[1]"BC-89 + CN",其中包含+,在正则表达式中具有特定含义。只有这样,在fixed = TRUE中添加grep可以解决问题,但$会失效。所以我所做的就是首先在组名中转义+

或者,与您的链接答案相关,您可以

lapply(groups, function(g)
  grep(paste0(g, "$"), paste0(labs, "$"), value = TRUE, fixed = TRUE))

答案 1 :(得分:1)

从stringr包中试试这个。 “coll”选项实现了“人类可读的排序规则”,可以帮助您匹配看起来相同的内容,但出于某种原因,R首先抵制匹配它们:

> library(stringr)
> str_detect(labs,coll(groups))
 [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  
TRUE FALSE FALSE
[16]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

答案 2 :(得分:0)

+是正则表达式中的特殊字符。你将需要“\ +”来逃避特殊角色。

new_group <- gsub("\\+",replacement = "\\\\+",x =groups)

另外,“|”在grep中就像“或”一样。

new_group1 <- paste0(new_group,collapse = "|")

grep(pattern = new_group1,x = labs,value = T)