Question

我正在尝试编写正确的regex模式以匹配以下条件

（包含单词other）或（包含us和car）

此代码可以正常工作：

str_detect(c('us cars',
             'u.s. cars',
             'us and bikes',
             'other'),
           regex('other|((?=.*us)(?=.*car))',
                 ignore_case = TRUE))
[1]  TRUE FALSE FALSE  TRUE

但是，如果我尝试包含us和u.s.之类的u.s（美国）的变体，则该模式将不再起作用。

str_detect(c('us cars',
             'u.s. cars',
             'us and bikes',
             'other'),
           regex('other|((?=.*us)(?=.*u.s.)(?=.*u.s)(?=.*car))',
                 ignore_case = TRUE))
[1] FALSE FALSE FALSE  TRUE

这是什么问题？谢谢！

Answer 1

点是正则表达式的元字符，如果您打算将其作为文字点，则需要转义。我对stringr软件包不太了解，但是您可以使用grepl来做到这一点：

x <- c('us cars', 'u.s. cars', 'us and bikes', 'other')
matches <- grepl("\\bother\\b|((?=.*\\bu\\.?s\\.?(?=\\s|$))(?=.*\\bcar\\b).*)", x, perl=TRUE)

正则表达式的解释：

\\bother\\b                        match "other"
|                                  OR
(
    (?=.*\\bu\\.?s\\.?(?=\\s|$))   lookahead and assert that
                                   "us" or "u.s" or "us." or "u.s." appears
    (?=.*\\bcar\\b)                lookahead and asser that "car" appears
    .*                             match anything
)

原始模式的问题在于，您永远不会匹配任何更改。不是一个完整的修复程序，但这是

regex('other|((?=.*us)(?=.*u.s.)(?=.*u.s)(?=.*car))', ignore_case=TRUE)

应该变成这样：

regex('other|((?=.*us)(?=.*u.s.)(?=.*u.s)(?=.*car).*)', ignore_case=TRUE)
                                                  ^^^ add this

正则表达式与AND和OR运算符匹配

1 个答案: