我经常使用XPath和php来解析页面, 但这次我不明白这个特定页面的行为与下面的代码,我希望你能帮助我。
我用来解析此页面的代码http://www.jeuxvideo.com/recherche.php?m=9&t=10&q=Call+of+duty:
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/*
$search = array("<article", "</article>");
$replace = array("<div", "</div>");
$response = str_replace($search, $replace, $response);
*/
$dom = new DOMDocument();
@$dom->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/a');
//$elements = $xpath->query('//div[@class="recherche-aphabetique-item"]/a');
count($elements);
var_dump($elements);
?>
小提琴测试它: http://phpfiddle.org/main/code/r9n6-d0j0
我只想在“文章”节点中获取所有“a”节点,其中包含“recherche-aphabetique-item”类。
但它没有给我任何回报:/。
正如你在注释代码中看到的那样,我试图将html5元素文章替换为div,但我也有同样的行为。
感谢你的帮助。
答案 0 :(得分:1)
我看到很多DOMDocument::loadHTML(): Unexpected end tag
错误 - 你应该使用libxml的内部错误处理函数来帮助解决这个问题。此外,当我查看远程站点的DOM时,我看不到任何与XPath查询匹配的a
标记,只有span
个标记
<?php
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Query);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
$response = curl_exec($ch);
curl_close($ch);
/* try to suppress errors using libxml */
libxml_use_internal_errors( true );
$dom = new DOMDocument();
/* additional flags for DOMDocument */
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
@$dom->loadHTML($response);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');
count( $elements );
var_dump( $elements );
?>
object(DOMNodeList)#97 (1) { ["length"]=> int(94) }
您可以尝试进一步简化:
$What = 'Call of duty';
$What = urlencode($What);
$Query = 'http://www.jeuxvideo.com/recherche.php?m=9&t=10&q='.$What;
libxml_use_internal_errors( true );
$dom = new DOMDocument();
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
@$dom->loadHTMLFile($Query);
libxml_clear_errors();
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//article[@class="recherche-aphabetique-item"]/span');
count($elements);
foreach( $elements as $node )echo $node->nodeValue,'<br />';