我正在检查不同网址上是否存在xml站点地图。如果我提供了一个URL example.com/sitemap.xml,它有一个301到www.example.com/sitemap.xml,我显然得到301。如果www.example.com/sitemap.xml不存在,我将看不到404.所以,如果我得到301,我会执行另一个cURL以查看404是否返回www.example.com/sitemap.xml。但是,由于理由,我得到随机的404和303状态代码。
private function check_http_status($domain,$file){
$url = $domain . "/" . $file;
$curl = new Curl();
$curl->url = $url;
$curl->nobody = true;
$curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
$curl->execute();
$retcode = $curl->httpCode();
if ($retcode == 301 || $retcode == 302){
$url = "www." . $domain . "/" . $file;
$curl = new Curl();
$curl->url = $url;
$curl->nobody = true;
$curl->userAgent = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.1) Gecko/20060601 Firefox/2.0.0.1 (Ubuntu-edgy)';
$curl->execute();
$retcode = $curl->httpCode();
}
return $retcode;
}
答案 0 :(得分:2)
查看返回的回复代码列表 - http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html。
通常,Web浏览器会自动处理这些内容,但是当您使用curl手动执行操作时,您需要了解每个响应的含义。 301
或302
表示您应使用提供的替代网址来访问资源。对于请求,这可能是一个简单的addin www
,但是当它重定向到另一个域altogather时也可能更复杂。
303
表示您正在使用POST
尝试访问该资源,并应使用GET
。
答案 1 :(得分:0)
好吧,当你收到301或302时,你应该使用在回复中找到的位置,而不仅仅是假设另一个位置并尝试。
正如您在此示例中所看到的,来自服务器的响应包含文件的新位置。用于下一个请求: http://en.wikipedia.org/wiki/HTTP_301#Example
答案 2 :(得分:0)
“followLocation”非常有效。以下是我实施它的方法:
$url = "http://www.YOURSITE.com//"; // Assign you url here.
$ch = curl_init(); // initialize curl.
curl_setopt($ch, CURLOPT_URL, $url); // Pass the URL as the option/target.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // 0 will print html. 1 does not.
curl_setopt($ch, CURLOPT_HEADER, 0); // Please curl, inlude the header in the output.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // ..and yes, follow what the server sends as part of the HTTP header.
$response_data = curl_exec($ch); // execute curl with the target URL.
$http_header = curl_getinfo($ch); // Gets information about the last transfer i.e. our URL
// Print the URLs that are not returning 200 Found.
if($http_header['http_code'] != "200") {
echo " <b> PAGE NOT FOUND => </b>"; print $http_header['http_code'];
}
// print $http_header['url']; // Print the URL sent back in the header. This will print the page to wich you were redirected.
print $url; // this will print the original URLs that you are trying to access
curl_close($ch); // we are done with curl; so let's close it.