Question

我有一些html模板有这种格式：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml">
<head> 
    <title>myTitle</title> 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />   
</head> 
<body bgcolor="#b23bba" style="background-color: #b23bba; margin: 0;"> 
    <table>
        <tr><td><img src="https://www.myurlname.com/anotherimg.jpg" /></td></tr>
        <tr><td>needed content</td></tr>
    </table>
    <img src="https://www.myurlname.com/e68f2e83c811d6bdb32876041a1cfa78.gif" width="1" height="1" />
</body>
</html>

需要做的是剥离这个模板，只取一部分并将其插入另一个模板中，该模板已经拥有html通用html标签，如html，head或body。我真正需要的是只保留身体标签之间的东西，但没有图像女巫的身高和宽度都是1px。

对于这种特殊情况，我必须只保留表格。我必须提到我将所有这些内容存储到php变量中。对此有什么解决方案吗？

Answer 1

好吧，考虑到你有一个完整且有效的DOM，你可以解析它，查询<body>节点并存储它。它只需要几行代码，使用the DOMDocument class：

$dom = new DOMDocument;
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
$bodyContents = $dom->saveXML($contents);

这将产生：

<body><!-- your markup here --></body>

要删除正文标记，只需执行简单的substr调用即可：

$clean = substr($bodyContents, 6, -7);

就是这样！ Here's a more full example BTW

当然，如果您的<body>代码可能包含属性，则必须首先删除这些属性。一般来说，这样的事情应该有效：

$body = $dom->getElementsByTagName('body')->item(0);
if ($body->hasAttributes())
{
    foreach($body->attributes as $attr)
    {
        $body->removeAttributeNode($attr);
    }
}

所有记录都相当顺利here, on the official PHP pages

事实证明，foreach并没有完全削减它，所以这里是完整的固定代码：

$dom = new DOMDocument;
//avoid unwanted HTML entities (like &#13;) from popping up:
$str = str_replace(array("\n", "\r"), '', $str);
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
while($contents->hasAttributes())
{//as long as hasAttributes returns true, remove the first of the list
    $contents->removeAttributeNode($contents->attributes->item(0));
}
//remove last image:
$imgs = $contents->getElementsByTagName('img');//get all images
if ($imgs && $imgs->length)
{//if there are img tags:
    $contents->removeChild($imgs->item($imgs->length -1));//length -1 is last element
}
$bodyContents = $dom->saveXML($contents);
$clean = trim(substr($bodyContents, 6, -7));//remove <body> tags

而且Here's the proof that it works
现在，没有那些讨厌的HTML实体的版本of the same codepad

现在，最后，一个键盘that removes the last img tag from the DOM, too

Php带模板

1 个答案: