Question

$body = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);

Hello there. I fund preg_replace, that finds all html tags, and removes their attributes. I need to exclude <a> tag from that regexp, so f.e.:

<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>

should be changed to:

<sth/><a href="http://awdwsrrdg.com />

Any help would be appreciated.

Answer 1

不要使用正则表达式来解析或修改HTML / XML。这只适用于一些边缘情况，但不适用于现实世界的应用程序

改为使用DOM解析器：

$html = '<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>';

$doc = new DOMDocument();
$doc->loadHTML($html);

$selector = new DOMXPath($doc);

foreach($selector->query('//@*[not(parent::a)]') as $attr) {
    $attr->parentNode->removeAttribute($attr->nodeName);
}

echo $doc->saveHTML();

Answer 2

众所周知，不应该使用正则表达式来解析xhtml （改为使用html解析器），因为引擎在解析奇怪的字符时可能会搞乱，除非你真的知道字符设置了什么'面对。

另一方面，如果你想使用正则表达式，你可以利用这个正则表达式的丢弃技术：

<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>

<强> Working demo

Php 代码

$re = '/<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>/'; 
$str = "<sth a=\"awdawd\"/><a href=\"http://awdwsrrdg.com\"/>"; 
$subst = "<$1 />"; 

$result = preg_replace($re, $subst, $str);

如果您想使用正则表达式，可以在开头添加丢弃模式，如下所示：

<a\b.*?\/>(*SKIP)(*FAIL)|<([a-z][a-z0-9]*)[^>]*?(\/?)>
           ^------^-----Discard pattern flags

Answer 3

试试这个正则表达式：

/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i

将第一个组规则[a-z]编辑为[b-z]。现在，每个开始<a的标记都将被忽略。

$body = preg_replace("/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);

WORKING DEMO

$ pattern = /<([b-z][a-z0-9]*)[^>]*?(\/?)>/i

$ replacement = <$1$2>

$ text = <sth a="awdawd"/><a href="http://awdwsrrdg.com"/>

输出： <sth /><a href="http://awdwsrrdg.com"/>

Regexp，删除所有html标签的atrributes，但是<a>

3 个答案: