正则表达式匹配最多2个完整单词和包含该字符的下一个单词

时间:2016-09-27 14:54:58

标签: php regex pcre

我开发了以下正则表达式以用于搜索字段 目标是使用它来匹配最多2个单词,然后是完整的单词与字符和之后的所有内容:

/^
    .*?                 # match anything before, as few times as possible
    (
        (?: 
            [^\s]+\s*   # anything followed by whitespace
        ){1,2}          # match once or twice
        \s*?            # match whitespaces that may be left behind, just in case
        [^\s]*?         # match the beginning of the word, if exists
    )?  
    (foo|bar)           # search term(s)
    ([^\s]*\s*.*)       # whatever is after, with whitespace, if it is the end of the word
$/xi

问题在于它并不总是正确匹配 一些例子,当搜索" a":

Fantastic drinks and amazing cakes

Expected match:
$1 = F
$2 = a
$3 = ntastic drinks and amazing cakes

Result:
$1 = Fantastic drinks (space)
$2 = a
$3 = nd amazing cakes

-----------------------------------------

Drinks and party!

Expected match:
$1 = Drinks (space)
$2 = a
$3 = nd party!

Result:
$1 = Drinks and p
$2 = a
$3 = rty!

------------------------------------------

Drinks will be served at the caffetary in 5 minutes

Expected match:
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes

Result (matches correctly):
$1 = be served (space)
$2 = a
$3 = t the caffetary in 5 minutes

您可以在包含单元测试的https://regex101.com/r/cI7gZ3/1上进行试验。

这种方式不起作用的方式很奇怪,这是我能描述的内容。但是,我的猜测是,这更喜欢在搜索词之前有1-2个单词的匹配。

你认为这可能是错的?您认为这会导致这些问题?

1 个答案:

答案 0 :(得分:1)

我建议在

中使用惰性版本的\S+{1,2}
(?: 
    \S+?\s* # anything followed by whitespace
){1,2}?

并删除[^\s]*? # match the beginning of the word, if exists部分。

请参阅updated regex demo

^
  .*? # match anything before, as few times as possible
  (
    (?: 
      \S*?\s* # anything followed by whitespace
    ){1,2}?
    \s* # just in case there's whitespace
  )?
  (a) # search term(s)
  (\S*\s*.*) # whatever is after, without whitespace if it is the end of the word
$