Question

我在regexp上不太好但是希望有人可以更好地向我解释，我在我调试的代码中找到了这个。我想知道为什么我总是在这种情况下弄错。

我知道\p{L}匹配“字母”类别中的单个代码点。 0-9是数字。

$regExp = /^\s*
     (?P([0-2]?[1-9]|[12]0|3[01]))\s+
     (?P\p{L}+?)\s+
     (?P[12]\d{3})\s*$/i;

    $value = '12 Février 2015' ;
    $matches = array();

    $match = preg_match($regExp, $value, $matches);

其他信息，我已经想出了这个：

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/i", "18 Février 2015");
var_dump($match); //It will print int(0).

但如果值为18 February 2015，则会打印int（1）。为什么会这样？假设在两个值中都返回1，因为\p{L}将接受unicode字符。

Answer 1

$regExp = '/^\s*(?P<y>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<m>\p{L}+?)\s+(?P<d>[12]\d{3})\s*$/usD';

$value = '12 Février 2015';
$matches = array();

$match = preg_match($regExp, $value, $matches);

var_dump($matches);

除非您想要错误，否则您必须<name>使用(?P ...并且通过unicode多行字符串，您需要usD标记。这很容易记住，就像美元一样......

Answer 2

不需要命名组，无论如何它们的语法似乎都是错误的。所以这个清理版本应该可以工作：

/^ \s*([0-2]?[1-9]|[12]0|3[01])\s+ \p{L}+?\s+ [12]\d{3}\s* $/i

当月的模式也更容易被理解为：

(0?[1-9]|[12][0-9]|3[01])

Answer 3

将u修饰符用于unicode：

$regExp = /^\s*
   (?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+
   (?P<monthNameFull>\p{L}+?)\s+
   (?P<yearFull>[12]\d{3})\s*$/u;
//                      here __^

i修饰符不是必需的，\p{L}不区分大小写。

Answer 4

想出一个修复，用/ u代替/ i。

$match = preg_match("/^\s*(?P<monthDay>([0-2]?[1-9]|[12]0|3[01]))\s+(?P<monthNameFull>\p{L}+?)\s+(?P<yearFull>[12]\d{3})\s*$/u", "18 Février 2015");
var_dump($match); //It will print int(1).

感谢所有人的帮助

REGEXP在特殊字符上返回false

4 个答案: