xPath查询无法在此网站上运行

时间:2016-12-10 13:54:38

标签: php xpath domdocument

以下是我获取网站标题的代码:

$finder = new DomXPath($doc);
$title = $finder->query('/html/head/title')->item(0)->textContent;
die($title);

它在某些网站上正常运行:
http://www.beytoote.com/news/politics-social/jnews151207.html

但不能在这个特定的网页上工作:
http://www.jamnews.ir/detail/News/742550

问题出在哪里?

1 个答案:

答案 0 :(得分:0)

如果你使用php cURL或file_get_contents这个特定的网站似乎阻止它给出错误信息。如果您设置用户代理,它似乎没问题。如果HTML中出现错误,我也会通过php-tidy运行它。

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.jamnews.ir/detail/News/742550');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$source = curl_exec($ch);
$config = array(
           'indent'         => true,
           'output-xhtml'   => true,
           'force-output'   => true,
           'wrap'           => 200);

$tidy = new tidy;
$tidy->parseString($source, $config, 'utf8');
$tidy->cleanRepair();
$doc = new DOMDocument();
$doc->loadHTML($tidy);
$finder = new DomXPath($doc);
$title = $finder->query('/html/head/title')->item(0)->textContent;
die($title);

给出:

جام نیوز :: JamNews - اجازه عربستان به اسرائیل برای حمله به ایران