这是我的代码,我想找出这张图片的网址。 它是一个生成随机图像的网站,所以我想总是从这个xPath中抓取图像
xPath = /html/body/section[3]/div/article/div/div[2]/div/div[2]/div/div/a/img
这是我的代码:
$url = "http://www.funcloud.com/random";
$result = (string) reset(simplexml_import_dom(DOMDocument::loadHTML($url))->xpath("//html/body/section[3]/div/article/div/div[2]/div/div[2]/div/div/a/img@src"));
if($result == null)
echo 'result is null';
一些想法?
答案 0 :(得分:0)
以下是调试方法:
<?php
$url = "http://www.funcloud.com/random";
$parts = array(
'body',
'section[3]',
'div',
'article',
'div',
'div[2]',
'div',
'div[2]',
'div',
'div',
'a',
'img',
);
$document = simplexml_import_dom(DOMDocument::loadHTML($url));
$xpath = '//html';
$previous_result = array();
foreach($parts as $part)
{
$xpath .= '/' . $part;
$result = $document->xpath($xpath);
if( ! count($result) )
{
echo('No content for: ' . $xpath . PHP_EOL);
break;
}
$previous_result = $result;
}
echo('Last good result: ' . PHP_EOL);
var_dump($previous_result);
这给了我:
No content for: //html/body/section[3]
Last good result:
array(1) {
[0] =>
class SimpleXMLElement#1 (1) {
public $p =>
string(30) "http://www.funcloud.com/random"
}
}
看起来你需要调整你的xpath。
修改强>: 实际上看起来服务已经停止:
% curl -D /dev/stdout -o /dev/null -s http://www.funcloud.com/random
HTTP/1.1 301 Moved Permanently
Server: nginx/1.2.1
Date: Mon, 11 Aug 2014 10:04:55 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.4.4-14+deb7u12
Set-Cookie: PHPSESSID=1t636677vk8uor7d6vfo3n4c91; path=/; domain=.funcloud.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Location: http://funcloud.com/