我构建此函数以从html页面获取h1标记:
//$html = file_get_html('https://www.sports-reference.com/olympics/summer/1896/');
//echo $html;
function getTextBetweenTags($url, $tagname) {
$values = array();
$html = file_get_html($url);
foreach($html->find($tagname) as $tag) {
$values[] = trim($tag->innertext);
}
return $values;
}
$output = getTextBetweenTags('https://www.sports-reference.com/olympics/summer/1896/', 'h1');
echo '<pre>';
print_r($output);
作为输出我得到:
Array
(
[0] => 1896 Athina Summer Games
)
是否可以改为:
Array
(
[0] => 1896
[1] => Athina
[2] => Summer
)
很好地接受了其他解决方案因为我确定h1标签是页面中唯一的标签,所以我不需要从html找到所有h1标签
答案 0 :(得分:1)
希望这会有所帮助
解决方案1:(而不是return $values;
你应该返回此内容)
$result=explode(" ",$values[0]);
array_pop($result);
return $result;
这里我们使用DOMDocument
来实现所需的输出
解决方案2:
ini_set('display_errors', 1);
function getTextBetweenTags($url, $tagname)
{
libxml_use_internal_errors(true);
$domDocument = new DOMDocument();
$domDocument->loadHTMLFile($url);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//$tagname");//querying tag
return explode(" ", $results->item(0)->textContent);//getting content of first tag and exploding it on space
}
$output = getTextBetweenTags('https://www.sports-reference.com/olympics/summer/1896/', 'h1');
array_pop($output);
print_r($output);
<强>输出:强>
Array
(
[0] => 1896
[1] => Athina
[2] => Summer
)