我试图从一个内容块中删除一个特定的div(它的内部内容),但是它不能正常工作。
正则表达式:
/<div class="greybackground_desktop".*>(.*)<\/div>/s
的preg_replace:
preg_replace($pattern, "", $holder, -1, $count );
现在,正则表达式确实剥离了我的div,但是如果有任何其他关闭div标签,它也会将它们剥离出来并且其中包含任何其他内容。
e.g。
<p>some random text</p>
<div class="greybackground_desktop" style="background-color:#EFEFEF;">
<!-- /49527960/CSF_Article_Middle -->
<div style="padding-bottom:10px; padding-top: 10px; text-align:center;" id='div-gpt-ad-1441883689230-0'>
<script type='text/javascript'>
googletag.cmd.push(function() { googletag.display('div-gpt-ad-1441883689230-0'); });
</script>
</div>
</div>
<p>some more text</p>
<div><p>example of content that will be incorrectly removed</p></div>
<p>Text that follows</p>
这将产生以下输出:
some random text
Text that follows
我想看到的是:
some random text
some more text
example of content that will be incorrectly removed
Text that follows
有什么想法吗?
答案 0 :(得分:3)
使用DOMDocument
之类的解析器。请考虑以下代码:
<?php
$dom = new DOMDocument();
$dom->loadHTML($your_html_here);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//div[@class='greybackground_desktop']") as $div)
$div->parentNode->removeChild($div);
echo $dom->saveHTML();
?>
该脚本会加载您的html
,查找包含div.greybackground_desktop
的元素并删除这些元素。可以在ideone.com上找到演示。
答案 1 :(得分:1)
正确的方法是使用像DOMDocument这样的Html Parser,这是一个例子:
$holder = <<< LOL
<p>some random text</p>
<div class="greybackground_desktop" style="background-color:#EFEFEF;">
<!-- /49527960/CSF_Article_Middle -->
<div style="padding-bottom:10px; padding-top: 10px; text-align:center;" id='div-gpt-ad-1441883689230-0'>
<script type='text/javascript'>
googletag.cmd.push(function() { googletag.display('div-gpt-ad-1441883689230-0'); });
</script>
</div>
</div>
<p>some more text</p>
<div><p>example of content that will be incorrectly removed</p></div>
<p>Text that follows</p>
LOL;
$dom = new DOMDocument();
//avoid the whitespace after removing the node
$dom->preserveWhiteSpace = false;
//parse html dom elements
$dom->loadHTML($holder);
//get the div from dom
if($div = $dom->getElementsByTagName('div')->item(0)) {
//remove the node by telling the parent node to remove the child
$div->parentNode->removeChild($div);
//save the new document
echo $dom->saveHTML();
}
如果您真的想使用正则表达式,请使用 lazy 一个.*?
代替贪婪 .*
,即:
$result = preg_replace('%<div class="greybackground_desktop".*?</div>\s+</div>%si', '', $holder);
详细了解正则表达式重复,特别是“ 懒惰而不是贪婪 ”