PHP 7 preg_replace PREG_JIT_STACKLIMIT_ERROR,带有简单的字符串

时间:2017-06-27 11:17:10

标签: php regex preg-replace pcre php-7

我知道其他人提出了有关此错误的问题,但我无法看到这个正则表达式或主题字符串是如何更简单。

对我来说,这是一个错误,但在将其提交给PHP之前,我认为我会确保并获得帮助,看看这是否更简单。

这是一个显示2个字符串的小测试脚本;一个1024 x,一个1023:

// 1024 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; 

// Outputs nothing (bug?)
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str); 

echo "\n\n";

// 1023 x's
$str = '_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; 

// Outputs the unchanged string as expected
echo preg_replace('/(?<=[^\w]|^)_([^_\n\t ](.|\n(?!\n))*?)_(?=[^\w]|$)/', '[i]${1}[/i]', $str);

正如您所看到的,只有稍长的字符串(大于1024个字符)才会出现错误。将由此处理的字符串将是任意长度 - 它们将是论坛帖子,新闻文章等。

正则表达式解释

尝试执行一些降价解析以将类似_I am italic_的字符串转换为我们在某些情况下从旧网站使用的旧版标记。原因/用途并不重要。重要的是,我相信这应该可以正常工作,事实上它确实如此,除了PHP 7以外的其他地方。

只有在表示独立的单词或句子时才应匹配这些下划线。它不应该与第一个下划线匹配,如果它前面有任何“基于单词”的字符,并且如果后跟任何“基于单词”的字符,它不应该与最后一个下划线匹配。

环境: Centos 7,PHP:7.1.6

1 个答案:

答案 0 :(得分:1)

重要提示
应避免使用(.|\n)*?(.|\r?\n)*?模式,因为它们会导致过多的冗余回溯。要匹配任何字符,您通常可以使用带有DOTALL标记的.,或者在JavaScript中,您可以使用[^][\s\S]结构。有关详细信息,请参阅How do I match any character across multiple lines in a regular expression?

当前问题

(.|\n(?!\n))*?模式的效率非常低,并且在模式结束时使用时会导致大量冗余回溯(根本没有任何意义)。它越位于模式的左侧,性能就越差。

由于它所做的一切都是匹配任何字符而不是换行符,然后是一个没有跟随另一个换行符的换行符,以懒惰的方式,您可以将该模式重写为.*?(?:\R(?!\R).*?)*

'~\b_([^_\n\t ].*?(?:\R(?!\R).*?)*)_\b~'

请参阅regex demo

注意:

  • (?<=[^\w]|^) = \b,因为在lookbehind之后有一个_(单词char)
  • (?=[^\w]|$) = \b,因为在前瞻之前有_
  • .*?(?:\R(?!\R).*?)* - 匹配:
    • .*? - 除了换行符之外的任何0 +字符,尽可能少,然后
    • (?:\R(?!\R).*?)* - 零个或多个序列:
      • \R(?!\R) - 换行符序列未跟随另一个换行符序列\R = \n\r\n\r
      • .*? - 除了换行符之外的任何0 +字符,尽可能少