Question

我想解析很多网址，只能获取他们的状态代码。

所以我做的是：

$handle = curl_init($url -> loc);
             curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
             curl_setopt($handle, CURLOPT_HEADER  , true);  // we want headers
             curl_setopt($handle, CURLOPT_NOBODY  , true);
             curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
             $response = curl_exec($handle);
             $httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
             curl_close($handle);

但是，只要“nobody”-option设置为true，返回的状态代码就不正确（google.com返回302，其他网站返回303）。

由于性能损失，无法将此选项设置为false。

有什么想法吗？

Answer 1

curl的默认HTTP请求方法是GET。如果只需要响应标头，则可以使用HTTP方法HEAD。

curl_setopt($handle, CURLOPT_CUSTOMREQUEST, 'HEAD');

根据@ Dai的回答，NOBODY已经在使用HEAD方法了。所以上面的方法不起作用。

另一种选择是使用fsockopen打开连接，使用fwrite编写标题。使用fgets阅读回复，直至第一次出现\r\n\r\n以获取完整标头。由于您只需要状态代码，因此您只需要读取前13个字符。

<?php
$fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
if ($fp) {
    $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: www.google.com\r\n";
    $out .= "Accept-Encoding: gzip, deflate, sdch\r\n";
    $out .= "Accept-Language: en-GB,en-US;q=0.8,en;q=0.6\r\n";
    $out .= "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36\r\n";
    $out .= "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
    $tmp = explode(' ', fgets($fp, 13));
    echo $tmp[1];
    fclose($fp);
}

Answer 2

cURL的nobody选项让它使用HEAD HTTP动词，我打赌大部分非静态网络应用程序我都不能正确处理这个动词，因此你会遇到问题看到不同的结果。我建议发出正常的GET请求并放弃回复。

Answer 3

我建议改为get_headers()：

<?php
$url = 'http://www.example.com';

print_r(get_headers($url));

print_r(get_headers($url, 1));
?>

使用cURL获取没有正文的http-statuscode？

3 个答案: