我正在使用此代码获取输入网址的内容: -
class MetaTagParser
{
public $metadata;
private $html;
private $url;
public function __construct($url)
{
$this->url=$url;
$this->html= $this->file_get_contents_curl();
$this->set_title();
$this->set_meta_properties();
}
public function file_get_contents_curl()
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
public function set_title()
{
$doc = new DOMDocument();
@$doc->loadHTML($this->html);
$nodes = $doc->getElementsByTagName('title');
$this->metadata['title'] = $nodes->item(0)->nodeValue;
}
这个类适用于某些页面但是对于某些类似于此的URL - http://www.dnaindia.com/india/report_in-a-first-upa-govt-tweets-the-press_1745346 当我尝试获取数据时,我收到此错误: - “警告:get_meta_tags(http://www.dnaindia.com/india/report_in-a-first-upa-govt-tweets-the-press_1745346):无法打开数据流:HTTP请求失败!HTTP / 1.1 403禁止在第52行的C:\ xampp \ htdocs \ prac \ index.php“
它不起作用,任何想法为什么会发生这种情况?
答案 0 :(得分:1)
有时网站管理员并不愚蠢,知道如何保护页面免受诽谤和抓取,所以你必须欺骗他的保护并呈现来自普通浏览器的用户代理。添加以下行:
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0.1",