以下是我获取网站标题的代码:
$finder = new DomXPath($doc);
$title = $finder->query('/html/head/title')->item(0)->textContent;
die($title);
它在某些网站上正常运行:
http://www.beytoote.com/news/politics-social/jnews151207.html
但不能在这个特定的网页上工作:
http://www.jamnews.ir/detail/News/742550
问题出在哪里?
答案 0 :(得分:0)
如果你使用php cURL或file_get_contents这个特定的网站似乎阻止它给出错误信息。如果您设置用户代理,它似乎没问题。如果HTML中出现错误,我也会通过php-tidy运行它。
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.jamnews.ir/detail/News/742550');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$source = curl_exec($ch);
$config = array(
'indent' => true,
'output-xhtml' => true,
'force-output' => true,
'wrap' => 200);
$tidy = new tidy;
$tidy->parseString($source, $config, 'utf8');
$tidy->cleanRepair();
$doc = new DOMDocument();
$doc->loadHTML($tidy);
$finder = new DomXPath($doc);
$title = $finder->query('/html/head/title')->item(0)->textContent;
die($title);
给出:
جام نیوز :: JamNews - اجازه عربستان به اسرائیل برای حمله به ایران