假设我有以下两个字符串,并希望使用grep
来查看匹配项:
business_metric_one
business_metric_one_dk
business_metric_one_none
business_metric_two
business_metric_two_dk
business_metric_two_none
等各种其他指标。我想只匹配每个组中的第一个(business_metric_one
和business_metric_two
等)。它们不在有序列表中,因此我无法索引并且必须使用grep
。起初我想过:
.*metric.*[^_dk|^_none]$
但这似乎不起作用。有什么想法吗?
答案 0 :(得分:2)
您需要使用PCRE模式来过滤字符向量:
x <- c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")
grep("metric(?!.*_(?:dk|none))", x, value=TRUE, perl=TRUE)
## => [1] "business_metric_one" "business_metric_two"
请参阅R demo
metric(?!.*(?:_dk|_none))
模式匹配
metric
- metric
子字符串(?!.*_(?:dk|none))
- 除了_
之后的换行符以及dk
或none
之后的任何0 +字符后面没有跟随。请参阅regex demo。
注意:如果您只需匹配包含metric
且不以_dk
或_none
结尾的值,请使用metric.*$(?<!_dk|_none)
的变体https://docs.microsoft.com/en-us/rest/api/searchservice/suggestions如果字符串以(?<!_dk|_none)
或_dk
结尾,则消极的lookbehind会使匹配失败。
答案 1 :(得分:0)
您也可以这样做:
grep("^([[:alpha:]]+_){2}[[:alpha:]]+$", string, value = TRUE)
# [1] "business_metric_one" "business_metric_two"
或使用grepl
匹配dk
和none
,然后在您对原始string
建立索引时否定逻辑:
string[!grepl("(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"
更简洁:
string[!grepl("business_metric_[[:alpha:]]+_(dk|none)", string)]
# [1] "business_metric_one" "business_metric_two"
数据:强>
string = c("business_metric_one","business_metric_one_dk","business_metric_one_none","business_metric_two","business_metric_two_dk","business_metric_two_none")