PHP检查下载链接而不下载文件

时间:2014-06-14 15:40:34

标签: php curl

在我的网站上,我有一些用于下载文件的链接,但我想创建一个PHP脚本来检查下载链接是否仍然在线。 这是我使用的代码:

$cl = curl_init($url);  
curl_setopt($cl,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($cl,CURLOPT_HEADER,true);
curl_setopt($cl,CURLOPT_NOBODY,true);
curl_setopt($cl,CURLOPT_RETURNTRANSFER,true);

if(!curl_exec($cl)){
    echo 'The download link is offline';
    die();
}

$code = curl_getinfo($cl, CURLINFO_HTTP_CODE);
if($code != 200){
    echo 'The download link is offline';
}else{
    echo 'The download link is online!';
}

问题是它下载了整个文件,这使得它非常慢,我只需要检查标题。我看到curl有一个选项CURLOPT_CONNECT_ONLY,但我使用的webhost有php版本5.4,它没有该选项。还有其他办法吗?

2 个答案:

答案 0 :(得分:3)

CURLOPT_CONNECT_ONLY会很好,但它只适用于PHP 5.5和住所。因此,请尝试使用get_headers。甚至可以使用fopenstream_context_create& stream_get_meta_data。首先是get_headers方法:

// Set a test URL.
$url = "https://www.google.com/";

// Get the headers.
$headers = get_headers($url);

// Check if the headers are empty.
if(empty($headers)){
  echo 'The download link is offline';
  die();
}

// Use a regex to see if the response code is 200.
preg_match('/\b200\b/', $headers[0], $matches);

// Act on whether the matches are empty or not.
if(empty($matches)){
  echo 'The download link is offline';
}
else{
  echo 'The download link is online!';
}

// Dump the array of headers for debugging.
echo '<pre>';
print_r($headers);
echo '</pre>';

// Dump the array of matches for debugging.
echo '<pre>';
print_r($matches);
echo '</pre>';

此输出 - 包括用于调试的转储 - 将是:

The download link is online!

Array
(
    [0] => HTTP/1.0 200 OK
    [1] => Date: Sat, 14 Jun 2014 15:56:28 GMT
    [2] => Expires: -1
    [3] => Cache-Control: private, max-age=0
    [4] => Content-Type: text/html; charset=ISO-8859-1
    [5] => Set-Cookie: PREF=ID=6e3e1a0d528b0941:FF=0:TM=1402761388:LM=1402761388:S=4YKP2U9qC6aMgxpo; expires=Mon, 13-Jun-2016 15:56:28 GMT; path=/; domain=.google.com
    [6] => Set-Cookie: NID=67=Wun72OJYmuA_TQO95WXtbFOK5g-xU53PQZ7dAIBtzCaBWxhXzduHQZfBVPf4LpaK3MVH8ZKbrBIc3-vTKuMlEnMdpWH0mcft5pA_0kCoe4qolDmednpPJqezZF_HyfXD; expires=Sun, 14-Dec-2014 15:56:28 GMT; path=/; domain=.google.com; HttpOnly
    [7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
    [8] => Server: gws
    [9] => X-XSS-Protection: 1; mode=block
    [10] => X-Frame-Options: SAMEORIGIN
    [11] => Alternate-Protocol: 443:quic
)

Array
(
    [0] => 200
)

以下是另一种使用fopenstream_context_create&amp; stream_get_meta_data。这种方法的好处是它为您提供了一些信息,说明除了标题之外还采取了哪些操作来获取URL:

// Set a test URL.
$url = "https://www.google.com/";

// Set the stream_context_create options.
$opts = array(
  'http' => array(
    'method' => 'HEAD'
   )
);

// Create context stream with stream_context_create.
$context  = stream_context_create($opts);

// Use fopen with rb (read binary) set and the context set above.
$handle = fopen($url, 'rb', false, $context);

// Get the headers with stream_get_meta_data.
$headers = stream_get_meta_data($handle);

// Close the fopen handle.
fclose($handle);

// Use a regex to see if the response code is 200.
preg_match('/\b200\b/', $headers['wrapper_data'][0], $matches);

// Act on whether the matches are empty or not.
if(empty($matches)){
  echo 'The download link is offline';
}
else{
  echo 'The download link is online!';
}

// Dump the array of headers for debugging.
echo '<pre>';
print_r($headers);
echo '</pre>';

这是输出:

The download link is online!

Array
(
    [wrapper_data] => Array
        (
            [0] => HTTP/1.0 200 OK
            [1] => Date: Sat, 14 Jun 2014 16:14:58 GMT
            [2] => Expires: -1
            [3] => Cache-Control: private, max-age=0
            [4] => Content-Type: text/html; charset=ISO-8859-1
            [5] => Set-Cookie: PREF=ID=32f21aea66dcfd5c:FF=0:TM=1402762498:LM=1402762498:S=NVP-y-kW9DktZPAG; expires=Mon, 13-Jun-2016 16:14:58 GMT; path=/; domain=.google.com
            [6] => Set-Cookie: NID=67=mO_Ihg4TgCTizpySHRPnxuTp514Hou5STn2UBdjvkzMn4GPZ4e9GHhqyIbwap8XuB8SuhjpaY9ZkVinO4vVOmnk_esKKTDBreIZ1sTCsz2yusNLKA9ht56gRO4uq3B9I; expires=Sun, 14-Dec-2014 16:14:58 GMT; path=/; domain=.google.com; HttpOnly
            [7] => P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
            [8] => Server: gws
            [9] => X-XSS-Protection: 1; mode=block
            [10] => X-Frame-Options: SAMEORIGIN
            [11] => Alternate-Protocol: 443:quic
        )

    [wrapper_type] => http
    [stream_type] => tcp_socket/ssl
    [mode] => rb
    [unread_bytes] => 0
    [seekable] => 
    [uri] => https://www.google.com/
    [timed_out] => 
    [blocked] => 1
    [eof] => 
)

答案 1 :(得分:1)

尝试添加curl_setopt( $cl, CURLOPT_CUSTOMREQUEST, 'HEAD' );以发送HEAD请求。