r:包含带否定模式的正则表达式

时间:2017-10-23 21:17:31

标签: r regex grep

假设我有以下两个字符串,并希望使用grep来查看匹配项:

business_metric_one
business_metric_one_dk
business_metric_one_none
business_metric_two
business_metric_two_dk
business_metric_two_none

等各种其他指标。我想只匹配每个组中的第一个(business_metric_onebusiness_metric_two等)。它们不在有序列表中,因此我无法索引并且必须使用grep。起初我想过:

.*metric.*[^_dk|^_none]$

但这似乎不起作用。有什么想法吗?

2 个答案:

答案 0 :(得分:2)

您需要使用PCRE模式来过滤字符向量:

x <- c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")
grep("metric(?!.*_(?:dk|none))", x, value=TRUE, perl=TRUE)
## => [1] "business_metric_one" "business_metric_two"

请参阅R demo

metric(?!.*(?:_dk|_none))模式匹配

  • metric - metric子字符串
  • (?!.*_(?:dk|none)) - 除了_之后的换行符以及dknone之后的任何0 +字符后面没有跟随。

请参阅regex demo

注意:如果您只需匹配包含metric且不以_dk_none结尾的值,请使用metric.*$(?<!_dk|_none)的变体https://docs.microsoft.com/en-us/rest/api/searchservice/suggestions如果字符串以(?<!_dk|_none)_dk结尾,则消极的lookbehind会使匹配失败。

答案 1 :(得分:0)

您也可以这样做:

grep("^([[:alpha:]]+_){2}[[:alpha:]]+$", string, value = TRUE)
# [1] "business_metric_one" "business_metric_two"

或使用grepl匹配dknone,然后在您对原始string建立索引时否定逻辑:

string[!grepl("(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"

更简洁:

string[!grepl("business_metric_[[:alpha:]]+_(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"

数据:

string = c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")