PHP preg_replace_callback在“a:”值上窒息

时间:2014-03-06 19:46:39

标签: php regex string

我们已经构建了一个平台,允许用户添加特殊的<#tags#>在HTML输入属性中...我使用preg_replace_callback来查找表单体字符串中的所有匹配输入,然后处理它们并返回整个表单的修改字符串,包括所有更新的输入元素。

我已经将问题缩小到最后一个属性值,从任何一系列字母开始,然后是冒号。这是破坏正则表达式并使其抛出“PREG_BACKTRACK_LIMIT_ERROR”的唯一情况

<input onclick="javascript:blah();"> 

会破坏它。我已经告诉开发人员他们应该使用onclick =“blah()”,但这曾经有用,浏览器支持它,所以他们仍然希望它能够工作。

<input onclick=":blah();">

没有打破它。这让我觉得它是某种内部存储使用“key:value”对来存储引用或其他东西,而它解析的数据本身正在破坏该数据模式。

一个真正奇怪的是,代码在谷歌应用引擎PHP上产生了不同的结果,并且在运行于centos的PHP 5.3.3上...本机PHP在更多情况下引发错误。

这里是测试代码和测试结果:

<?php

process_string("<input type=\"button\" value=\"update google doc\" onclick=\"javascript:getgoogledoc();\">");
process_string("<input type=\"button\" value=\"update google doc\" onclick=\":getgoogledoc();\">");
process_string("<input type=\"button\" value=\"update google doc\" onclick=\"getgoogledoc();\">");
process_string("<input type=\"button\" value=\"update google doc\" onclick=\"getgoogledoc();\" newattribute=\"javascript:test();\">");
process_string("<input type=\"button\" value=\"update google doc\" onclick=\"a:getgoogledoc();\">");
process_string("<input type=\"a:button\" value=\"javascript:update google doc\">");
process_string("<input type=\"button\" value=\"javascript:update google doc\" <# this makes it match #> onclick=\"javascript:getgoogledoc();\">");
process_string("<input type=\"button\" value=\"javascript:update google doc\" <# this makes it match #> onclick=\"getgoogledoc();\">");

function process_string($string) {
    echo "<p><b>NEW TEST</b><br />initial string:<br />";
    echo htmlspecialchars($string);
    $string = preg_replace_callback(
        '/<\s*input\s+((\s*(\w+)\s*=\s*(\'(\\\\\\\\|\\\\\'|[^\'])*\'|"(\\\\\\\\|\\\\"|[^"])*"|(\w+))|\s*(\w+))*\s*)<#\s*(.*?)\s*#>((\s*(\w+)\s*=\s*(\'(\\\\\\\\|\\\\\'|[^\'])*\'|"(\\\\\\\\|\\\\"|[^"])*"|(\w+))|\s*(\w+))*\s*)(\/\s*|)>/is',
        function($matches) {
            echo "<br />matched something...";
            return $matches[0];
        },
        $string
    );
    echo "<br />ok... ran the regex replace callback... string is now:<br />";
    echo htmlspecialchars($string);
    $last_error = preg_last_error();
    echo "<br />the last regex error was: $last_error";
    if($last_error==PREG_NO_ERROR) {
        echo "<br />that is a PREG_NO_ERROR";
    }
    if($last_error==PREG_INTERNAL_ERROR) {
        echo "<br />that is a PREG_INTERNAL_ERROR";
    }
    if($last_error==PREG_BACKTRACK_LIMIT_ERROR) {
        echo "<br />that is a PREG_BACKTRACK_LIMIT_ERROR";
    }
    if($last_error==PREG_RECURSION_LIMIT_ERROR) {
        echo "<br />that is a PREG_RECURSION_LIMIT_ERROR";
    }
    if($last_error==PREG_BAD_UTF8_ERROR) {
        echo "<br />that is a PREG_BAD_UTF8_ERROR";
    }
    if($last_error==PREG_BAD_UTF8_OFFSET_ERROR) {
        echo "<br />that is a PREG_BAD_UTF8_OFFSET_ERROR";
    }
}

?>

结果:

NEW TEST
initial string:
<input type="button" value="update google doc" onclick="javascript:getgoogledoc();">
ok... ran the regex replace callback... string is now:

the last regex error was: 2
that is a PREG_BACKTRACK_LIMIT_ERROR

NEW TEST
initial string:
<input type="button" value="update google doc" onclick=":getgoogledoc();">
ok... ran the regex replace callback... string is now:
<input type="button" value="update google doc" onclick=":getgoogledoc();">
the last regex error was: 0
that is a PREG_NO_ERROR

NEW TEST
initial string:
<input type="button" value="update google doc" onclick="getgoogledoc();">
ok... ran the regex replace callback... string is now:
<input type="button" value="update google doc" onclick="getgoogledoc();">
the last regex error was: 0
that is a PREG_NO_ERROR

NEW TEST
initial string:
<input type="button" value="update google doc" onclick="getgoogledoc();" newattribute="javascript:test();">
ok... ran the regex replace callback... string is now:

the last regex error was: 2
that is a PREG_BACKTRACK_LIMIT_ERROR

NEW TEST
initial string:
<input type="button" value="update google doc" onclick="a:getgoogledoc();">
ok... ran the regex replace callback... string is now:

the last regex error was: 2
that is a PREG_BACKTRACK_LIMIT_ERROR

NEW TEST
initial string:
<input type="a:button" value="javascript:update google doc">
ok... ran the regex replace callback... string is now:
<input type="a:button" value="javascript:update google doc">
the last regex error was: 0
that is a PREG_NO_ERROR

NEW TEST
initial string:
<input type="button" value="javascript:update google doc" <# this makes it match #> onclick="javascript:getgoogledoc();">
matched something...
ok... ran the regex replace callback... string is now:
<input type="button" value="javascript:update google doc" <# this makes it match #> onclick="javascript:getgoogledoc();">
the last regex error was: 0
that is a PREG_NO_ERROR

NEW TEST
initial string:
<input type="button" value="javascript:update google doc" <# this makes it match #> onclick="getgoogledoc();">
matched something...
ok... ran the regex replace callback... string is now:
<input type="button" value="javascript:update google doc" <# this makes it match #> onclick="getgoogledoc();">
the last regex error was: 0
that is a PREG_NO_ERROR

1 个答案:

答案 0 :(得分:2)

PREG_BACKTRACK_LIMIT_ERROR由于过度回溯而发生,可以使用Possessive Quantifiers来处理 尝试对正则表达式进行此修改(注意我在^)指示的位置添加了+量词 -

'/<\s*input\s+((\s*(\w+)\s*=\s*(\'(\\\\\\\\|\\\\\'|[^\'])*\'|"(\\\\\\\\|\\\\"|[^"])*"|(\w+))|\s*(\w+))*+\s*)<#\s*(.*?)\s*#>((\s*(\w+)\s*=\s*(\'(\\\\\\\\|\\\\\'|[^\'])*\'|"(\\\\\\\\|\\\\"|[^"])*"|(\w+))|\s*(\w+))*\s*)(\/\s*|)>/is'
                                                                                                       ^