为什么这个php正则表达式与所有输入都不匹配

时间:2013-01-16 23:14:28

标签: php regex

为什么以下正则表达式:$regex = '/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/i'; 与下面的所有输入都不匹配

我确实认为这个(V|E)?\d{1,2}? ?可以选择字母,第一个或第一个数字和第一个空格

INPUT

<?php

$sms = array(
    'test test test 11 111 111 test test test',
    'test test test 1 111 111 test test test',
    'test test test 111 111 test test test', // does not match
    'test test test test test test 11111111',
    'test test test 1111111 test test test',
    'test test test 111111 test test test', // does not match
    'test test test E11 111 111 test test test',
    'test test test V1 111 111 test test test',
    'test test test V111 111 test test test', // does not match
    'test test test V11111111 test test test',
    'test test test V1111111 test test test',
    'test test test E111111 test test test', // does not match
    'test test test V 11 111 111 test test test',
    'test test test V 1 111 111 test test test',
    'test test test E 111 111 test test test', // does not match
    'test test test V 11111111 test test test',
    'test test test V 1111111 test test test',
    'test test test V 111111 test test test', //does not match
    'test test test V11 111 111 test test test',
    'test test test V1 111 111 test test test',
    'test test test E111 111 test test test', //does not match
    'test test test V11111111 test test test',
    'V1111111 test test test  test test test',
    'test test test V111111 test test test', // does not match
);

$regex = '/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/i';
$noMatches = 0;
$index = 0;
foreach($sms as $v) {
    $match = preg_match($regex, $v, $matches);



    if($match) {
        //print_r($matches);
        //echo "$v match!\n";
        //$matches++;
    }
    else {
        echo "$index - $v does NOT match!\n";
        $noMatches++;
    }
    $index++;
}
$total = count($sms);
echo "\n\nTotal: $total\nNo Matches: $noMatches\n";

输出

$ php test-regex.php 
2 - test test test 111 111 test test test does NOT match!
5 - test test test 111111 test test test does NOT match!
8 - test test test V111 111 test test test does NOT match!
11 - test test test E111111 test test test does NOT match!
14 - test test test E 111 111 test test test does NOT match!
17 - test test test V 111111 test test test does NOT match!
20 - test test test E111 111 test test test does NOT match!
23 - test test test V111111 test test test does NOT match!


Total: 24
No Matches: 8

编辑:

使用马里奥建议正则表达式现在是$regex = '/\b(V|E)?\d{0,2} ?\d{3} ?\d{3}\b/i';, 为什么在某些情况下,此正则表达式不会捕获字母VE

$output = array(
    'test test test E11 111 111 test test test' => 'E11 111 111',
    'test test test V1 111 111 test test test' => 'V1 111 111',
    'test test test V111 111 test test test' => 'V111 111',
    'test test test V11111111 test test test' => 'V11111111',
    'test test test V1111111 test test test' => 'V1111111',
    'test test test E111111 test test test' => 'E111111',
    'test test test V 11 111 111 test test test' => '11 111 111', // Missing Letter
    'test test test V 1 111 111 test test test' => '1 111 111', // Missing Leter
    'test test test E 111 111 test test test' => 'E 111 111',
    'test test test V 11111111 test test test' => '11111111', // Missing Letter
    'test test test V 1111111 test test test' => '1111111', // Missing Letter
    'test test test V 111111 test test test' => 'V 111111',
    'test test test V11 111 111 test test test' => 'V11 111 111',
    'test test test V1 111 111 test test test' => 'V1 111 111',
    'test test test E111 111 test test test' => 'E111 111',
    'test test test V11111111 test test test' => 'V11111111',
    'V1111111 test test test  test test test' => 'V1111111',
    'test test test V111111 test test test' => 'V111111',
    'V 1111111 test test test' => '1111111', // Missing Letter
    'test test test V 1111111 test test test' => '1111111', // Missing Letter
);

3 个答案:

答案 0 :(得分:2)

?只是在群组或文字字符或字符类之后的量词,例如

如果?发生在另一个量词*+{n,m}之后,它只会使匹配变得不那么贪婪。这意味着正则表达式将尝试匹配最少量。

所以\d{1,2}?并不意味着可选。这意味着匹配一两个,但更喜欢只匹配一个。你打算改为写\d{0,2}

答案 1 :(得分:1)

它们不匹配,因为正则表达式总共需要至少7位数字:

/\b(V|E)?\d{1,2}? ?\d{3} ?\d{3}\b/
             |        |      |
             |        |      \-------->  3 digits exactly
             |        \--------------->  3 digits exactly
             \------------------------>  1 or 2 digits (prefers 1, but will match
                                         2 if there are 8 digits in a row)

所有失败的输入都是一位数的短。

答案 2 :(得分:1)

如果你想让第一部分完全可选,你必须将它括在括号中并附加一个?。您还可以使用V|E

的字符组
(?:[VE]\d{1,2} )?