Question

我有index.html

<html> <head> bla bla bla </head> <body class="someclass"> bla bla bla </body> </html>

我需要获取body标签内的内容。试过这个

<?php $site = file_get_contents("index.html"); preg_match("/<body[^>]*>(.*?) \/body>/is", $site, $matches); print ($matches[1]); ?>

但它不输出任何东西。请告诉我这里的问题。谢谢。

Answer 1

<?php 
$site = file_get_contents("index.html"); 
preg_match("/<body.*?>(.*?)<\/body>/is", $site, $matches); 
print ($matches[1]); 
?>

Answer 2

这可能不是你的答案，但我建议你试试php DOMDocument link

Answer 3

"/<body[^>]*>(.*?) \/body>/is"应为"/<body[^>]*>(.*?)<\/body>/is"

Answer 4

您应该查看PHP Simple HTML DOM Parser：http://simplehtmldom.sourceforge.net/

你可以用这样的东西得到身体：

$html = file_get_html('index.html')
$body = $html->find('body');

然后您可以通过以下方式获取内部HTML：

$content = $body->innertext;