使用下面的代码我只得到空白页面的名称或昵称没有得到回应。我越过检查了它正确的路径仍然没有回应任何东西
<?php
$url="http://www.mans-best-friend.org.uk/dog-breeds-alphabetical-list.htm";
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1');
$html = curl_exec($curl_handle);
curl_close($curl_handle);
$mydoc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(empty($html)) die("EMPTY HTML");
$mydoc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$my_xpath = new DOMXPath($mydoc);
//////////////////////////////////////////////////////
$nodes = $my_xpath->query( '//*[@id="table94"]/tbody/tr/td' );
foreach( $nodes as $node )
{
$title=$my_xpath->query( 'p[@data-iceapc="1"]/span/a/font', $node );
$nickname=$my_xpath->query( 'p[@data-iceapc="2"]/span/a/font', $node );
echo $title." ".$nickname."<br>";
}
?>
如果你找不到p元素。滚动到狗名称所在的部分。对于例如 Affenpinscher 右键单击它并选择inspect - 它显示了p元素。
答案 0 :(得分:0)
首先,你必须“修复”xpath的html代码才能正常工作,因为它包含太多错误。在这种情况下,我只提取id table94所需的表。
之后,您可以在dom对象上使用xpath来获取所需的数据:
<?php
$url="http://www.mans-best-friend.org.uk/dog-breeds-alphabetical-list.htm";
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1');
$html = curl_exec($curl_handle);
curl_close($curl_handle);
$html = preg_replace('/^.*(<table[^>]*id="table94">.*?<\/table>).*$/is', '\1', $html);
$mydoc = new DOMDocument();
$mydoc->loadHTML($html);
$my_xpath = new DOMXPath($mydoc);
$nodes = $my_xpath->query( '//tr' );
foreach( $nodes as $node )
{
if ($my_xpath->query('td[position()=last()-1]/p/span/a/font', $node)->length > 0) {
echo $my_xpath->query('td[position()=last()-1]/p/span/a/font', $node)->item(0)->textContent.' ';
echo $my_xpath->query('td[position()=last()]/p/span/font', $node)->item(0)->textContent."<br />";
}
}