Question

有任何方法可以提取HTML页面的内容，该页面以<body>开头，以</body>结尾。如果有人可以发布一些示例代码。

Answer 1

您应该查看DOMDocument参考。

此示例读取html文档，创建DOMDocument并获取正文标记：

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);

$body = $dom->getElementsByTagName('body')->item(0);

echo $body->textContent; // print all the text content in the body

您还应该查看以下资源：

DOM API Documentation
XPATH language specification

Answer 2

尝试PHP Simple HTML DOM Parser

$html = file_get_html('http://www.example.com/');
$body = $html->find('body');

Answer 3

您还可以尝试使用基于strpos函数的非DOM解决方案：

$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));

stripos是strpos的不区分大小写的版本，strripos是strpos的不区分大小写的“最右侧位置”版本。

希望它能帮到你！

在php中提取html页面的内容

3 个答案: