Xpath返回空白页面不回显值

时间:2016-05-18 14:32:30

标签: php

使用下面的代码我只得到空白页面的名称或昵称没有得到回应。我越过检查了它正确的路径仍然没有回应任何东西

<?php

$url="http://www.mans-best-friend.org.uk/dog-breeds-alphabetical-list.htm";

$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1');
$html = curl_exec($curl_handle);
curl_close($curl_handle);

$mydoc = new DOMDocument();

libxml_use_internal_errors(TRUE); //disable libxml errors

if(empty($html)) die("EMPTY HTML");

    $mydoc->loadHTML($html);
    libxml_clear_errors(); //remove errors for yucky html

    $my_xpath = new DOMXPath($mydoc);

    //////////////////////////////////////////////////////

    $nodes = $my_xpath->query( '//*[@id="table94"]/tbody/tr/td' );    

    foreach( $nodes as $node )
    {  
    $title=$my_xpath->query( 'p[@data-iceapc="1"]/span/a/font', $node );
    $nickname=$my_xpath->query( 'p[@data-iceapc="2"]/span/a/font', $node );
    echo $title." ".$nickname."<br>";     
    }

?>

如果你找不到p元素。滚动到狗名称所在的部分。对于例如 Affenpinscher 右键单击它并选择inspect - 它显示了p元素。

1 个答案:

答案 0 :(得分:0)

首先,你必须“修复”xpath的html代码才能正常工作,因为它包含太多错误。在这种情况下,我只提取id table94所需的表。

之后,您可以在dom对象上使用xpath来获取所需的数据:

<?php
$url="http://www.mans-best-friend.org.uk/dog-breeds-alphabetical-list.htm";

$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1');
$html = curl_exec($curl_handle);
curl_close($curl_handle);

$html = preg_replace('/^.*(<table[^>]*id="table94">.*?<\/table>).*$/is', '\1', $html);

$mydoc = new DOMDocument();
$mydoc->loadHTML($html);

$my_xpath = new DOMXPath($mydoc);

$nodes = $my_xpath->query( '//tr' );    

foreach( $nodes as $node )
{
    if ($my_xpath->query('td[position()=last()-1]/p/span/a/font', $node)->length > 0) {
        echo $my_xpath->query('td[position()=last()-1]/p/span/a/font', $node)->item(0)->textContent.' ';
        echo $my_xpath->query('td[position()=last()]/p/span/font', $node)->item(0)->textContent."<br />";
    }
}