我试图从网址抓取一些数据 在简单的html dom的帮助下。 但是当id启动我的爬虫时会发出错误
**无法打开流:HTTP请求失败! HTTP / 1.1 404 Not Found **
我尝试了cUrl但是抛出了404错误。
这里我的php简单dom代码
function getURLContent($url)
{
$html = new simple_html_dom();
$html->load_file($url);
/* i perfome some opetions here*/
}
和cUrl
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
echo $data;
curl_close($curl);
我怎么能这样做??
提前致谢..
答案 0 :(得分:0)
是尝试配置useragent
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
答案 1 :(得分:0)
将这些添加到您的代码中并尝试
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
curl_setopt($ch, CURLOPT_HEADER, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers); //set headers
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // set true for https urls
答案 2 :(得分:0)