我有一些html模板有这种格式:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>myTitle</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body bgcolor="#b23bba" style="background-color: #b23bba; margin: 0;">
<table>
<tr><td><img src="https://www.myurlname.com/anotherimg.jpg" /></td></tr>
<tr><td>needed content</td></tr>
</table>
<img src="https://www.myurlname.com/e68f2e83c811d6bdb32876041a1cfa78.gif" width="1" height="1" />
</body>
</html>
需要做的是剥离这个模板,只取一部分并将其插入另一个模板中,该模板已经拥有html通用html标签,如html,head或body。我真正需要的是只保留身体标签之间的东西,但没有图像女巫的身高和宽度都是1px。
对于这种特殊情况,我必须只保留表格。我必须提到我将所有这些内容存储到php变量中。对此有什么解决方案吗?
答案 0 :(得分:1)
好吧,考虑到你有一个完整且有效的DOM,你可以解析它,查询<body>
节点并存储它。它只需要几行代码,使用the DOMDocument
class:
$dom = new DOMDocument;
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
$bodyContents = $dom->saveXML($contents);
这将产生:
<body><!-- your markup here --></body>
要删除正文标记,只需执行简单的substr
调用即可:
$clean = substr($bodyContents, 6, -7);
就是这样! Here's a more full example BTW
当然,如果您的<body>
代码可能包含属性,则必须首先删除这些属性。一般来说,这样的事情应该有效:
$body = $dom->getElementsByTagName('body')->item(0);
if ($body->hasAttributes())
{
foreach($body->attributes as $attr)
{
$body->removeAttributeNode($attr);
}
}
所有记录都相当顺利here, on the official PHP pages
事实证明,foreach
并没有完全削减它,所以这里是完整的固定代码:
$dom = new DOMDocument;
//avoid unwanted HTML entities (like ) from popping up:
$str = str_replace(array("\n", "\r"), '', $str);
$dom->loadHTML($str);
$contents = $dom->getElementsByTagName('body')->item(0);
while($contents->hasAttributes())
{//as long as hasAttributes returns true, remove the first of the list
$contents->removeAttributeNode($contents->attributes->item(0));
}
//remove last image:
$imgs = $contents->getElementsByTagName('img');//get all images
if ($imgs && $imgs->length)
{//if there are img tags:
$contents->removeChild($imgs->item($imgs->length -1));//length -1 is last element
}
$bodyContents = $dom->saveXML($contents);
$clean = trim(substr($bodyContents, 6, -7));//remove <body> tags
而且Here's the proof that it works
现在,没有那些讨厌的HTML实体的版本of the same codepad