如何使用php删除带有空文本节点的标签?
例如,
<div class="box"></div>
删除
<a href="#"></a>
删除
<p><a href="#"></a></p>
删除
<span style="..."></span>
删除
但是我希望将标记与文本节点保持一致,
<a href="#">link</a>
保持
修改
我想删除像这样杂乱的东西,
<p><strong><a href="http://xx.org.uk/dartmoor-arts"></a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>
我在下面测试了两个正则表达式,
$content = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$content);
$content = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $content);
但他们留下这样的东西,
<p><strong></strong></p>
<p><strong></strong></p>
<p><strong></strong></p>
答案 0 :(得分:3)
一种方法可能是:
$dom = new DOMDocument();
$dom->loadHtml(
'<p><strong><a href="http://xx.org.uk/dartmoor-arts">test</a></strong></p>
<p><strong><a href="http://xx.org.uk/depw"></a></strong></p>
<p><strong><a href="http://xx.org.uk/devon-guild-of-craftsmen"></a></strong></p>'
);
$xpath = new DOMXPath($dom);
while(($nodeList = $xpath->query('//*[not(text()) and not(node())]')) && $nodeList->length > 0) {
foreach ($nodeList as $node) {
$node->parentNode->removeChild($node);
}
}
echo $dom->saveHtml();
可能你需要根据需要改变一点。
答案 1 :(得分:0)
您可以执行正则表达式替换,如:
$updated="";
while($updated != $original) {
$updated = $original;
$original = preg_replace('!<(.*?)[^>]*>\s*</\1>!','',$updated);
}
将它放在while循环中应该修复它。
答案 2 :(得分:0)
你应该缓冲PHP输出,然后使用一些正则表达式解析该输出,如下所示:
// start buffering output
ob_start();
// do some output
echo '<div id="non-empty">I am not empty</div><a class="empty"></a>';
// at this point you want to output the contents to the client
$contents = ob_get_contents();
// end buffering and flush
ob_end_flush();
// replace empty html tags
$contents = preg_replace('%<(.*?)[^>]*>\\s*</\\1>%', '', $contents);
// echo the sanitized contents
echo $contents;
请告诉我这是否有帮助:)