Question

我有正则表达式字符串数据，但想要排除子字符串

dat <- c('long_regex_other_stuff','long_regex_other_random.something')
(dat[grep('long_regex',dat)])
(dat[grep('long_regex.*(?!.*something$)',dat)])

预期第一个grep输出

"long_regex_other_stuff"            "long_regex_other_random.something"

如何让第二个grep工作？所需的输出是

"long_regex_other_stuff"

参考：Regular expression to match a line that doesn't contain a word?

Answer 1

您需要删除正则表达式中字符串.*之前的前一个something，并在否定前瞻后添加它，

> dat <- c('long_regex','long_regex.something')
> (dat[grep('long_regex(?!.*something).*',dat, perl=T)])
[1] "long_regex"
> (dat[grep('long_regex(?!.*\\bsomething\\b).*',dat, perl=T)])
[1] "long_regex"

此正则表达式中存在的

long_regex(?!.*something)否定前瞻声明在子字符串something之后没有字符串long_regex。

> dat <- c('long_regex_other_stuff','long_regex_other_random.something')
> (dat[grep('long_regex(?!.*\\bsomething\\b).*',dat, perl=T)])
[1] "long_regex_other_stuff"

R：如何匹配正则表达式但不匹配子串

1 个答案: