如何从字符串中删除特定HTML标记,但仅限于其他某些HTML标记内?
我们想要删除<br />
和<strong>
之间的所有</strong>
代码,但保留所有其他代码和所有其他<br />
代码在<strong></strong>
之外没有动过。
示例字符串:
$t = "Optional start 1 <strong>bold stuff 1<br /></strong> optional end 1 and optional start 2 <strong>bold stuff 2<br /></strong>optional end 2 and break follows now <br />and text continues here in a new line, new strong comes here: <strong>here I am, very bold <br /> and I am still bold after a break.</strong>";
方法:
$x = preg_replace("@(.*)(\<strong\>)(.*)\<br \/\>(.*)(\<\/strong\>)(.*)@U", "$1$2$3$4$5$6", $t);
结果:
$x = "Optional start 1 <strong>bold stuff 1</strong> optional end 1 and optional start 2 <strong>bold stuff 2</strong>optional end 2 and break follows now <br />and text continues here in a new line, new strong comes here: <strong>here I am, very bold and I am still bold after a break.</strong>";
我的方法实际上似乎有用,但我绝对不确定,如果它确实在更复杂的情况下。或者,如果字符串中存在某些标记错误,它仍然可以工作。或者可能有更好的解决方案。
修改
由于使用RegEx解析HTML不是最好的主意,根据评论以及我现在遇到的其他一些帖子,我试过了:
$DomDoc = new DOMDOcument;
$DomDoc->loadxml($t);
$xpath = new DOMXpath($DomDoc);
foreach($xpath->query('strong/br') as $node) {
$node->parentNode->removeChild($node);
}
$x = $DomDoc->savexml();
到目前为止似乎工作正常。