Question

我一直试图从url（defimedia.info）获取html标签的innertext，但我只得到1输出。我尝试的代码是：

$html = file_get_contents("http://www.defimedia.info");
preg_match("'<h3>(.*?)<h3>'si", $html, $match);
echo($match[1]);

即使我尝试使用foreach或者我尝试使用$ match [2]，它也不起作用。任何帮助肯定会受到赞赏。

尊敬的bhaamb

Answer 1

你需要preg_match_all函数。在此记录http://php.net/manual/en/function.preg-match-all.php

尝试这样。

<?php
$html = file_get_contents("http://www.defimedia.info");
preg_match_all('/<h3>(.*?)<h3>/si', $html, $match);
print_r($match);
?>

Answer 2

Regex is not the correct tool for parsing HTML/XML instead you can use DOMDocument

您可以像{/ p>一样使用DOMDocument

$html = file_get_contents("http://www.defimedia.info");
$dom = new DOMDocument();

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);

$h3s = $dom->getElementsByTagName('h3');
foreach ($h3s as $h3) {
    echo $h3->nodeValue."<br>";
}

Why did I used libxml_use_internal_errors(true); ?

PHP - preg_match无法从html url获取所有元素

2 个答案: