Question

我使用PHP使用数据库将数据从我们的一个站点提取到另一个站点。部分原因是我在HTML中找到文件时移动文件。

这方面的一个方面是需要检查该文件是否存在，以及它是否不是HTML（意味着有一个实际文件位于。

使用get_headers需要很长时间才能使用2.2MB PDF。尝试使用以下CURL请求执行相同的操作：

    public function getHeaders( $url ){
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_URL, $url );
    //curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
    //curl_setopt( $ch, CURLOPT_VERBOSE, 0 );
    //curl_setopt( $ch, CURLOPT_HEADER, 1 );
    curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'HEAD' );
    curl_exec( $ch );
    $info = curl_getinfo( $ch );
    curl_close( $ch );
    return $info;
}

这里的问题是，只需要很长时间（~20 +秒）就可以恢复标题。一旦我知道它是一个文件和200，那么我将返回并下载并将其插入我的新数据库。

有关如何让标题更好，更快的任何想法？感谢。

======编辑10：30a CDT 4/20/2015 ======

执行建议方法的示例代码：

<?php

//$file = 'http://www.pmi.org/Certification/~/media/PDF/Certifications/pdc_pmphandbook.ashx';
$file = 'https://www.projectmanagement-training.net/download/book_project_management.pdf';

print( 'Starting CURL Method : ' );
$time_start = microtime( true ); 
$headers = getHeaders( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );



print( 'Starting get_headers() Method : ' );
$time_start = microtime( true ); 
$headers = get_headers( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );



print( 'Starting get_headers() with context type Method : ' );
$time_start = microtime( true ); 
stream_context_set_default( array( 'http' => array( 'method' => 'HEAD', 'ignore_errors' => true ) ) );
$headers = get_headers( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );



print( 'Starting file_get_contents Method : ' );
$time_start = microtime( true ); 
$context = stream_context_create( array( 'http' => array( 'method' => 'HEAD', 'ignore_errors' => true ) ) );
$file = file_get_contents( $file, false, $context );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $http_response_header, true ) . '</pre>' );











function getHeaders( $url ){
    $ch = curl_init();
    curl_setopt( $ch, CURLOPT_URL, $url );
    //curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
    //curl_setopt( $ch, CURLOPT_VERBOSE, 0 );
    //curl_setopt( $ch, CURLOPT_HEADER, 1 );
    curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'HEAD' );
    curl_exec( $ch );
    $info = curl_getinfo( $ch );
    curl_close( $ch );
    return $info;
}




?>

输出：

Starting CURL Method : 0.01373608 seconds 
Array
(
    [url] => https://www.projectmanagement-training.net/download/book_project_management.pdf
    [content_type] => 
    [http_code] => 0
    [header_size] => 0
    [request_size] => 0
    [filetime] => -1
    [ssl_verify_result] => 1
    [redirect_count] => 0
    [total_time] => 0.202
    [namelookup_time] => 0
    [connect_time] => 0.124
    [pretransfer_time] => 0
    [size_upload] => 0
    [size_download] => 0
    [speed_download] => 0
    [speed_upload] => 0
    [download_content_length] => -1
    [upload_content_length] => -1
    [starttransfer_time] => 0
    [redirect_time] => 0
    [redirect_url] => 
    [primary_ip] => 81.169.145.64
    [certinfo] => Array
        (
        )

    [primary_port] => 443
    [local_ip] => 127.0.0.1
    [local_port] => 62741
)
Starting get_headers() Method : 0.03559045 seconds 
Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Mon, 20 Apr 2015 15:28:28 GMT
    [2] => Server: Apache/2.2.29 (Unix)
    [3] => X-Powered-By: PHP/5.3.29
    [4] => Content-Disposition: attachment; filename="book_project_management.pdf"
    [5] => Content-Type: application/pdf
    [6] => Connection: close
)
Starting get_headers() with context type Method : 0.03277322 seconds 
Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Mon, 20 Apr 2015 15:28:30 GMT
    [2] => Server: Apache/2.2.29 (Unix)
    [3] => X-Powered-By: PHP/5.3.29
    [4] => Content-Disposition: attachment; filename="book_project_management.pdf"
    [5] => Content-Type: application/pdf
    [6] => Connection: close
)
Starting file_get_contents Method : 0.04345868 seconds 
Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Mon, 20 Apr 2015 15:28:33 GMT
    [2] => Server: Apache/2.2.29 (Unix)
    [3] => X-Powered-By: PHP/5.3.29
    [4] => Content-Disposition: attachment; filename="book_project_management.pdf"
    [5] => Content-Type: application/pdf
    [6] => Connection: close
)

Answer 1

如果您的目标是仅使用此函数获取标头，为什么不使用PHP内置？：）

http://php.net/manual/en/function.get-headers.php

Answer 2

file_get_contents可能是一种更快捷的方式，因为选项允许您只返回标题信息：

<?php
    $url = "http://static.adzerk.net/Advertisers/831a088cf67e42c580e407e2d91c8ce6.jpg";

    $options = [
          'http' => [
               'method' => "HEAD",
               'ignore_errors' => 1
                ]
    ];

    $context = stream_context_create($options);
    $file = file_get_contents($url, false, $context);
    print_r($http_response_header);
?>

虽然如上所述，PHPs股票函数：http://php.net/manual/en/function.get-headers.php可能有诀窍：）

Answer 3

在$ info数组中检查这些时间。这些将告诉你时间花在哪里：

CURLINFO_NAMELOOKUP_TIME
CURLINFO_CONNECT_TIME
CURLINFO_PRETRANSFER_TIME
CURLINFO_STARTTRANSFER_TIME
CURLINFO_SPEED_DOWNLOAD
CURLINFO_TOTAL_TIME

测试这两个站点的链接：

http://www.webpagetest.org/和
http://gtmetrix.com/

如果使用get_headers()设置get_headers() stream_context_set_default()的默认值

get_headers()使用stream_context_set_default()，因此这是一个有效选项。

   stream_context_set_default(
        array(
            'http' => array(
                'method' => 'HEAD'
            )
        )
    );
    $headers = get_headers('http://example.com');

RE：curl

你不会得到这一行的标题：“

//curl_setopt( $ch, CURLOPT_HEADER, 1 );

此外，您无法检索响应标头所在的数据：

设置以下选项：

curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);

您需要添加超时，并在出错时启用失败：

curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");



$data = curl_exec($ch);

if (curl_errno($ch)){
    $info['error'] = curl_error($ch);
}
else {
  $skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE)); 
  $requestHeader= substr($data,0,$skip);
  $info = curl_getinfo($ch);
  $info['requestHeader'] = $requestHeader;
}
return $info;

使用PHP，获取大文件URL的标头

3 个答案: