我使用PHP使用数据库将数据从我们的一个站点提取到另一个站点。部分原因是我在HTML中找到文件时移动文件。
这方面的一个方面是需要检查该文件是否存在,以及它是否不是HTML(意味着有一个实际文件位于。
使用get_headers需要很长时间才能使用2.2MB PDF。尝试使用以下CURL请求执行相同的操作:
public function getHeaders( $url ){
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
//curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
//curl_setopt( $ch, CURLOPT_VERBOSE, 0 );
//curl_setopt( $ch, CURLOPT_HEADER, 1 );
curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'HEAD' );
curl_exec( $ch );
$info = curl_getinfo( $ch );
curl_close( $ch );
return $info;
}
这里的问题是,只需要很长时间(~20 +秒)就可以恢复标题。一旦我知道它是一个文件和200,那么我将返回并下载并将其插入我的新数据库。
有关如何让标题更好,更快的任何想法?感谢。
======编辑10:30a CDT 4/20/2015 ======
执行建议方法的示例代码:
<?php
//$file = 'http://www.pmi.org/Certification/~/media/PDF/Certifications/pdc_pmphandbook.ashx';
$file = 'https://www.projectmanagement-training.net/download/book_project_management.pdf';
print( 'Starting CURL Method : ' );
$time_start = microtime( true );
$headers = getHeaders( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );
print( 'Starting get_headers() Method : ' );
$time_start = microtime( true );
$headers = get_headers( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );
print( 'Starting get_headers() with context type Method : ' );
$time_start = microtime( true );
stream_context_set_default( array( 'http' => array( 'method' => 'HEAD', 'ignore_errors' => true ) ) );
$headers = get_headers( $file );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $headers, true ) . '</pre>' );
print( 'Starting file_get_contents Method : ' );
$time_start = microtime( true );
$context = stream_context_create( array( 'http' => array( 'method' => 'HEAD', 'ignore_errors' => true ) ) );
$file = file_get_contents( $file, false, $context );
$execution_time = round( ( microtime( true ) - $time_start )/60, 8 );
print ( $execution_time . ' seconds <br />' );
print( '<pre>' . print_r( $http_response_header, true ) . '</pre>' );
function getHeaders( $url ){
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
//curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
//curl_setopt( $ch, CURLOPT_VERBOSE, 0 );
//curl_setopt( $ch, CURLOPT_HEADER, 1 );
curl_setopt( $ch, CURLOPT_CUSTOMREQUEST, 'HEAD' );
curl_exec( $ch );
$info = curl_getinfo( $ch );
curl_close( $ch );
return $info;
}
?>
输出:
Starting CURL Method : 0.01373608 seconds
Array
(
[url] => https://www.projectmanagement-training.net/download/book_project_management.pdf
[content_type] =>
[http_code] => 0
[header_size] => 0
[request_size] => 0
[filetime] => -1
[ssl_verify_result] => 1
[redirect_count] => 0
[total_time] => 0.202
[namelookup_time] => 0
[connect_time] => 0.124
[pretransfer_time] => 0
[size_upload] => 0
[size_download] => 0
[speed_download] => 0
[speed_upload] => 0
[download_content_length] => -1
[upload_content_length] => -1
[starttransfer_time] => 0
[redirect_time] => 0
[redirect_url] =>
[primary_ip] => 81.169.145.64
[certinfo] => Array
(
)
[primary_port] => 443
[local_ip] => 127.0.0.1
[local_port] => 62741
)
Starting get_headers() Method : 0.03559045 seconds
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Mon, 20 Apr 2015 15:28:28 GMT
[2] => Server: Apache/2.2.29 (Unix)
[3] => X-Powered-By: PHP/5.3.29
[4] => Content-Disposition: attachment; filename="book_project_management.pdf"
[5] => Content-Type: application/pdf
[6] => Connection: close
)
Starting get_headers() with context type Method : 0.03277322 seconds
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Mon, 20 Apr 2015 15:28:30 GMT
[2] => Server: Apache/2.2.29 (Unix)
[3] => X-Powered-By: PHP/5.3.29
[4] => Content-Disposition: attachment; filename="book_project_management.pdf"
[5] => Content-Type: application/pdf
[6] => Connection: close
)
Starting file_get_contents Method : 0.04345868 seconds
Array
(
[0] => HTTP/1.1 200 OK
[1] => Date: Mon, 20 Apr 2015 15:28:33 GMT
[2] => Server: Apache/2.2.29 (Unix)
[3] => X-Powered-By: PHP/5.3.29
[4] => Content-Disposition: attachment; filename="book_project_management.pdf"
[5] => Content-Type: application/pdf
[6] => Connection: close
)
答案 0 :(得分:1)
如果您的目标是仅使用此函数获取标头,为什么不使用PHP内置? :)
答案 1 :(得分:0)
file_get_contents可能是一种更快捷的方式,因为选项允许您只返回标题信息:
<?php
$url = "http://static.adzerk.net/Advertisers/831a088cf67e42c580e407e2d91c8ce6.jpg";
$options = [
'http' => [
'method' => "HEAD",
'ignore_errors' => 1
]
];
$context = stream_context_create($options);
$file = file_get_contents($url, false, $context);
print_r($http_response_header);
?>
虽然如上所述,PHPs股票函数:http://php.net/manual/en/function.get-headers.php可能有诀窍:)
答案 2 :(得分:0)
在$ info数组中检查这些时间。这些将告诉你时间花在哪里:
CURLINFO_NAMELOOKUP_TIME
CURLINFO_CONNECT_TIME
CURLINFO_PRETRANSFER_TIME
CURLINFO_STARTTRANSFER_TIME
CURLINFO_SPEED_DOWNLOAD
CURLINFO_TOTAL_TIME
测试这两个站点的链接:
http://www.webpagetest.org/
和
http://gtmetrix.com/
如果使用get_headers()
设置get_headers()
stream_context_set_default()
的默认值
get_headers()
使用stream_context_set_default()
,因此这是一个有效选项。
stream_context_set_default(
array(
'http' => array(
'method' => 'HEAD'
)
)
);
$headers = get_headers('http://example.com');
RE:curl
你不会得到这一行的标题:“
//curl_setopt( $ch, CURLOPT_HEADER, 1 );
此外,您无法检索响应标头所在的数据:
设置以下选项:
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
您需要添加超时,并在出错时启用失败:
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
curl_setopt($ch, CURLOPT_ENCODING,"");
$data = curl_exec($ch);
if (curl_errno($ch)){
$info['error'] = curl_error($ch);
}
else {
$skip = intval(curl_getinfo($ch, CURLINFO_HEADER_SIZE));
$requestHeader= substr($data,0,$skip);
$info = curl_getinfo($ch);
$info['requestHeader'] = $requestHeader;
}
return $info;