使用DOMDocument删除HTML标记

时间:2014-09-02 20:57:06

标签: php html domdocument

我想从我的html中删除<font>代码并尝试使用replaceChild这样做,但它似乎无法正常工作。谁能抓住可能出错的东西?

$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');

foreach($font_tags as $font_tag) {
  foreach($font_tag as $child) {
    $child->replaceChild($child->nodeValue, $font_tag);
  }
}

echo $dom->saveHTML();

根据我的理解,$font_tagsDOMNodeList,因此我需要迭代两次才能使用DOMNode::replaceChild函数。然后我想用标签内的内容替换当前值。但是,当我输出$ html时没什么变化。什么想法可能是错的?

Here是用于测试代码的PHP Sandbox。

3 个答案:

答案 0 :(得分:2)

我会将我的评论内联

$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');

/* You only need one loop, as it is iterating your collection 
   You would only need a second loop if each font tag had children of their own
*/
foreach($font_tags as $font_tag) {
  /* replaceChild replaces children of the node being called
     So, to replace the font tag, call the function on its parent
     $prent will be that reference
  */
  $prent = $font_tag->parentNode;
   /* You can't insert arbitrary text, you have to create a textNode
      That textNode must also be a member of your document
   */
  $prent->replaceChild($dom->createTextNode($font_tag->nodeValue), $font_tag);

}

echo $dom->saveHTML();

Updated Sandbox: Hopefully I understood your requirements correctly

答案 1 :(得分:0)

首先,让我们找出代码中没有的内容。

  1. foreach($font_tag as $child)甚至没有迭代一次,因为$font_tag是一个单一字体&#39;来自font_tags数组的标记元素,而不是数组本身。

  2. $child->replaceChild($child->nodeValue, $font_tag); - 子节点无法替换其父节点($font_tag),但反过来也是可能的。 因为replaceChild是父节点替换其子节点的方法 有关详细信息,请查看PHP: DOMNode::replaceChild文档或我的代码下方的第2点。

  3. echo $html将输出$html字符串,但不会输出我们正在修改的更新$dom对象。


  4. 这可行 -

    $html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';
    
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    $font_tags = $dom->GetElementsByTagName('font');
    
    foreach($font_tags as $font_tag)
    {
        $new_node = $dom->createTextNode($font_tag->nodeValue);
        $font_tag->parentNode->replaceChild($new_node, $font_tag);
    }
    
    echo $dom->saveHTML();
    
    1. 我正在$new_node中直接创建$dom,因此该节点存在于DOMDocument中,而不是任何局部变量。

    2. 要替换子对象$font_tag,我们必须先使用parentNode方法遍历父节点。

    3. 最后,我们使用$dom方法打印出修改后的saveHTMLconvert the DOMDocument into a HTML String

答案 2 :(得分:0)

从 HTML 中删除特定的 span 标签,同时使用 PHP 和 DOMDocument 保留/保留内部内容

<?php

$content = '<span style="font-family: helvetica; font-size: 12pt;"><div>asdf</div><span>TWO</span>Business owners are fearful of leading. They would rather follow the leader than embrace a bold move that challenges their confidence. </span>';

$dom = new DOMDocument();
// Use LIBXML for preventing output of doctype, <html>, and <body> tags
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//span[@style="font-family: helvetica; font-size: 12pt;"]') as $span) {

    // Move all span tag content to its parent node just before it.
    while ($span->hasChildNodes()) {
        $child = $span->removeChild($span->firstChild);
        $span->parentNode->insertBefore($child, $span);
    }

    // Remove the span tag.
    $span->parentNode->removeChild($span);
}

// Get the final HTML with span tags stripped
$output = $dom->saveHTML();

print_r($output);