如何使用preg_replace删除整个div

时间:2012-05-18 19:39:55

标签: php html html-parsing

好吧,因为它是WordPress的问题而且遗憾的是,我需要删除父div及其内部的每个表示:

<div class="sometestclass">
   <img ....>
   <div>.....</div>
   any other html tags
</div><!-- END: .sometestclass -->

我唯一的想法是匹配以:

开头的所有内容
<div class="sometestclass">

并以:

结束
<!-- END: .sometestclass -->

所有介于两者之间(无论如何我都可以标记父div的结尾,这只是一个示例)。 任何人都知道如何做到这一点:

<?php $content = preg_replace('?????','',$content); ?>

4 个答案:

答案 0 :(得分:9)

我不会使用正则表达式。相反,我会使用DOMDocument类。只需找到该类的所有div元素,然后将其从父级中删除:

$html = "<p>Hello World</p>
         <div class='sometestclass'>
           <img src='foo.png'/>
           <div>Bar</div>
         </div>";

$dom = new DOMDocument;
$dom->loadHTML( $html );

$xpath = new DOMXPath( $dom );
$pDivs = $xpath->query(".//div[@class='sometestclass']");

foreach ( $pDivs as $div ) {
  $div->parentNode->removeChild( $div );
}

echo preg_replace( "/.*<body>(.*)<\/body>.*/s", "$1", $dom->saveHTML() );

结果是:

<p>Hello World</p>

答案 1 :(得分:6)

<?php $content = preg_replace('/<div class="sometestclass">.*?<\/div><!-- END: .sometestclass -->/s','',$content); ?>

我的RegEx有点生疏,但我认为这应该有效。请注意,正如其他人所说,RegEx没有适当的装备来处理HTML的一些复杂性。

此外,此模式不会找到类div的嵌入式sometestclass元素。你需要递归。

答案 2 :(得分:0)

一些CSS .sometestclass{display: none;}怎么样?

答案 3 :(得分:0)

对于UTF-8问题,我发现了PHP-manual

的黑客攻击

所以我的功能如下:

function rem_fi_cat() {
/* This function removes images from _within_ the article.
 * If these images are enclosed in a "wp-caption" div-tag.
 * If the articles are post formatted as "image".
 * Only on home-page, front-page an in category/archive-pages.
 */
if ( (is_home() || is_front_page() || is_category()) && has_post_format( 'image' ) ) {
    $document = new DOMDocument();
    $content = get_the_content( '', true );
    if( '' != $content ) {
        /* incl. UTF-8 "hack" as described at 
         * http://www.php.net/manual/en/domdocument.loadhtml.php#95251
         */
        $document->loadHTML( '<?xml encoding="UTF-8">' . $content );
        foreach ($doc->childNodes as $item) {
            if ($item->nodeType == XML_PI_NODE) {
                $doc->removeChild($item); // remove hack
                $doc->encoding = 'UTF-8'; // insert proper
            }
        }
        $xpath = new DOMXPath( $document );
        $pDivs = $xpath->query(".//div[@class='wp-caption']");

        foreach ( $pDivs as $div ) {
            $div->parentNode->removeChild( $div );
        }

        echo preg_replace( "/.*<div class=\"entry-container\">(.*)<\/div>.*/s", "$1", $document->saveHTML() );

    }
}

}