我尝试使用以下代码下载网址http://es.extpdf.com/nagore-pdf.html。但我得到的状态代码为0作为回报。但是当从http://web-sniffer.net/访问它时,它会显示301重定向。我的代码似乎也适用于301重定向的URL。
可能是什么问题?
<?php
print disavow_download_url("http://es.extpdf.com/nagore-pdf.html");
function disavow_download_url($url) {
$custom_headers = array();
$custom_headers[] = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$custom_headers[] = "Pragma: no-cache";
$custom_headers[] = "Cache-Control: no-cache";
$custom_headers[] = "Accept-Language: en-us;q=0.7,en;q=0.3";
$custom_headers[] = "Accept-Charset: utf-8,windows-1251;q=0.7,*;q=0.7";
$ch = curl_init();
$useragent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/9.0.1";
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // set user agent
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $custom_headers);
//these two from https
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 10); //timeout in seconds
$txResult = curl_exec($ch);
$statuscode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print "statuscode=$statuscode\n";
print "result=$txResult\n";
}
答案 0 :(得分:1)
该网址可从美国访问,而不是从您所在的地区访问。它适用于web-sniffer,因为它们的服务器托管在美国(或者是extpdf允许的某个区域)。
我使用了curl的USA代理,它返回了数据。
curl_setopt($ch, CURLOPT_PROXY, "100.9.90.1:3128"); // change IP, Port