PHP - 正则表达式删除所有出现的事件属性

时间:2014-02-24 08:51:10

标签: php regex

经过几个小时的尝试,我在这里问。我想从POSTed文本中删除 js事件属性和样式属性的所有出现。它可能包含也可能不包含新行

发布示例文字:

<a href="http://www.google.com" onclick="unwanted_code" style="unwanted_style" ondblclick="unwanted_code" onmouseover="unwanted_code">google</a> is a search engine. There are other engines too. <a href="http://www.yahoo.com" onclick="unwanted_code" ondblclick="unwanted_code" onmouseover="unwanted_code" style="unwanted_style">yahoo</a> is another engine.

首先尝试:

$pattern[0] = '/(<[^>]+) on.*=".*?"/iU';
$replace[0] = '$1';
$pattern[1] = '/(<[^>]+) style=".*?"/iU';
$replace[1] = '$1';
$out = preg_replace($pattern, $replace, $in);

输出:

<a href="http://www.google.com">yahoo</a> is another engine.

第二次尝试:

$out = preg_replace_callback('/(<[^>]+) on.*=".*?"/iU', function($m) {return $m[1];}, $in);

输出:

<a href="http://www.google.com">yahoo</a> is another engine.

输出我想要的是:

<a href="http://www.google.com">google</a> is a search engine. There are other engines too. <a href="http://www.yahoo.com">yahoo</a> is another engine.
谁帮助我了?

3 个答案:

答案 0 :(得分:3)

怎么样:

$content = '<a href="http://www.google.com" onclick="unwanted_code" style="unwanted_style" ondblclick="unwanted_code" onmouseover="unwanted_code">google</a> is a search engine. There are other engines too. <a href="http://www.yahoo.com" onclick="unwanted_code" ondblclick="unwanted_code" onmouseover="unwanted_code" style="unwanted_style">yahoo</a> is another engine.';

$result = preg_replace('%(<a href="[^"]+")[^>]+(>)%m', "$1$2", $content);
echo $result,"\n";

<强>输出:

<a href="http://www.google.com">google</a> is a search engine. There are other engines too. <a href="http://www.yahoo.com">yahoo</a> is another engine.

答案 1 :(得分:2)

即使这个问题被标记为,我仍然会添加这个答案,因为它对输入验证更加健壮;此特定解决方案仅接受某些标记并限制允许的属性:

$doc->loadHTML('<html><body>' . $html . '</body></html>');

$allowedTags = ['a' => ['href']];

$body = $doc->getElementsByTagName('body')->item(0);

$elements = $body->getElementsByTagName('*');
for ($k = 0; $element = $elements->item($k); ) {
    $name = strtolower($element->nodeName);
    if (isset($allowedTags[$name])) {
        $allowedAttributes = $allowedTags[$name];
        for ($i = 0; $attribute = $element->attributes->item($i); ) {
            if (!in_array($attribute->nodeName, $allowedAttributes)) {
                $element->removeAttribute($attribute->nodeName);
                continue;
            }
            ++$i;
        }
    } else {
        $element->parentNode->removeChild($element);
        continue;
    }
    ++$k;
}

$result = '';

foreach ($body->childNodes as $childNode) {
    $result .= $doc->saveXML($childNode);
}

echo $result;

答案 2 :(得分:0)

由于您要保留属性(href),因此无法全部删除它们。使用此代码,您可以实现所需的功能,但列出了所有不需要的属性:

preg_replace('#(onclick|style|ondblclick|onmouseover)="[^"]+"#', '', $in);

也许它可以简化,但这只是有效:)