PHP - `get_headers`和`stream_get_meta_data`之间的差异?

时间:2016-08-23 22:39:13

标签: php get-headers

简介/免责声明

这些产品的体积很大,可以忽略不计。它仍然是一个读者,但我正在尝试彻底的分析和质疑。 如果您熟悉stream_get_meta_data,可以跳到最后的“问题”

除了在DOC中,我很难找到关于PHP stream_get_meta_data的更多内容。整体功能与PHP get_headers的功能并没有太大差别,但我不能在我的生活中找到两者之间的比较,或前者的优点/缺点。

设置

到目前为止,我一直使用PHP的get_headers来验证URL的有效性。 get_headers的缺点是notoriously slow。可以理解的是,大部分延迟直接归因于托管感兴趣的网站的服务器,但可能该方法过于强大,或者其他东西正在减慢它。

有很多链接推荐使用CURL,声称速度更快,但我对两者都进行了并排,定时测试,get_headers总是出现最高,通常是1.5或2倍。

我还没有看到使用stream_get_meta_data的任何解决方案,今天才第一次偶然发现它。我已经耗尽了我的Google技能,没有太多运气。但是,为了优化我的计划,我进行了一些测试。

测试

使用106个当前(即实时,有效,状态= 200)网址列表运行get_headersstream_get_meta_data之间的比较:

Code Block #1

// All URLs in format "http://www.domain.com"
$urls = array('...', '...', '...'); // *106 URLs

// get_headers
$start = microtime(true);
foreach($urls as $url) {
    try{
        // Unfortunately, get_headers does not offer a context argument
        stream_context_set_default(array('http' => array('method' => "HEAD")));
        $headers[] = @get_headers($url, 1); 
        stream_context_set_default(array('http' => array('method' => "GET")));

    }catch(Exception $e){
        continue;
    }
}
$end1 = microtime(true) - $start;

// stream_get_meta_data
$cont = stream_context_create(array('http' => array('method' => "HEAD")));
$start = microtime(true);
foreach($urls as $url) {
    try{
        $fp = fopen($url, 'rb', false, $cont);
        if(!$fp) {
            continue;
        }
        $streams[] = stream_get_meta_data($fp);

    }catch(Exception $e){
        continue;
    }
}
$end2 = microtime(true) - $start;

我得到的结果是stream_get_meta_data排在最前面, 90%的时间或更多。有时候时间几乎相同,但stream_get_meta_data的运行时间往往更短

Run Times #1

"get_headers": 112.23 // seconds
"stream_get":  42.61 // seconds

两者的[stringified]输出类似于:

Excerpt of Comparison #1

url  ..  "http://www.wired.com/"

get_headers
|    0  ............................  "HTTP/1.1 200 OK"
|    Access-Control-Allow-Origin  ..  "*"
|    Cache-Control  ................  "stale-while-revalidate=86400, stale-while-error=86400"
|    Content-Type  .................  "text/html; charset=UTF-8"
|    Link  .........................  "; rel=\"https://api.w.org/\""
|    Server  .......................  "Apache"
|    Via
|    |    "1.1 varnish"
|    |    "1.1 varnish"
|    
|    Fastly-Debug-State  ...........  "HIT"
|    Fastly-Debug-Digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695"
|    Content-Length  ...............  "135495"
|    Accept-Ranges  ................  "bytes"
|    Date  .........................  "Tue, 23 Aug 2016 22:32:26 GMT"
|    Age  ..........................  "701"
|    Connection  ...................  "close"
|    X-Served-By  ..................  "cache-jfk8149-JFK, cache-den6024-DEN"
|    X-Cache  ......................  "HIT, HIT"
|    X-Cache-Hits  .................  "51, 1"
|    X-Timer  ......................  "S1471991546.459931,VS0,VE0"
|    Vary  .........................  "Accept-Encoding"

stream_get
|    wrapper_data
|    |    "HTTP/1.1 200 OK"
|    |    "Access-Control-Allow-Origin: *"
|    |    "Cache-Control: stale-while-revalidate=86400, stale-while-error=86400"
|    |    "Content-Type: text/html; charset=UTF-8"
|    |    "Link: ; rel=\"https://api.w.org/\""
|    |    "Server: Apache"
|    |    "Via: 1.1 varnish"
|    |    "Fastly-Debug-State: HIT"
|    |    "Fastly-Debug-Digest: c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695"
|    |    "Content-Length: 135495"
|    |    "Accept-Ranges: bytes"
|    |    "Date: Tue, 23 Aug 2016 22:32:26 GMT"
|    |    "Via: 1.1 varnish"
|    |    "Age: 701"
|    |    "Connection: close"
|    |    "X-Served-By: cache-jfk8149-JFK, cache-den6020-DEN"
|    |    "X-Cache: HIT, HIT"
|    |    "X-Cache-Hits: 51, 1"
|    |    "X-Timer: S1471991546.614958,VS0,VE0"
|    |    "Vary: Accept-Encoding"
|    
|    wrapper_type  .................  "http"
|    stream_type  ..................  "tcp_socket/ssl"
|    mode  .........................  "rb"
|    unread_bytes  .................  0
|    seekable  .....................  false
|    uri  ..........................  "http://www.wired.com/"
|    timed_out  ....................  false
|    blocked  ......................  true
|    eof  ..........................  false

在大多数情况下,所有相同的数据,除了stream_get_meta_data没有提供任何方式来包含wrapper_data的密钥,而无需手动解析它。

够容易......

Code Block #2.1/2.2

$wd = $meta[$url]['wrapper_data'];
$wArr = wrapperToKeys($wd);

其中...

function wrapperToKeys($wd) {
    $wArr = array();
    foreach($wd as $row) {
        $pos = strpos($row, ': '); // *Assuming* that all separated by ": " (Might be colon, without the space?)

        if($pos === false) {
            $wArr[] = $row;
        }else {
            // $pos, $key and $value can probably be done with one good preg_match
            $key = substr($row, 0, $pos);
            $value = substr($row, ($pos + 2));

            // If key doesn't exist, assign value
            if(empty($wArr[$key])) {            
                $wArr[$key] = $value;
            }

            // If key already points to an array, add value to array
            else if(is_array($wArr[$key])) {    
                $wArr[$key][] = $value;
            }

            // If key currently points to string, swap value into an array
            else {                          
                $wArr[$key] = array($wArr[$key], $value);
            }
        }
    }

    return $wArr;
}

输出相同get_headers($url, 1)

Excerpt of Comparison #2

url  ..  "http://www.wired.com/"

headers
|    0  ............................  "HTTP/1.1 200 OK"
|    Access-Control-Allow-Origin  ..  "*"
|    Cache-Control  ................  "stale-while-revalidate=86400, stale-while-error=86400"
|    Content-Type  .................  "text/html; charset=UTF-8"
|    Link  .........................  "; rel=\"https://api.w.org/\""
|    Server  .......................  "Apache"
|    Via
|    |    "1.1 varnish"
|    |    "1.1 varnish"
|    
|    Fastly-Debug-State  ...........  "HIT"
|    Fastly-Debug-Digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695"
|    Content-Length  ...............  "135495"
|    Accept-Ranges  ................  "bytes"
|    Date  .........................  "Tue, 23 Aug 2016 22:35:29 GMT"
|    Age  ..........................  "883"
|    Connection  ...................  "close"
|    X-Served-By  ..................  "cache-jfk8149-JFK, cache-den6027-DEN"
|    X-Cache  ......................  "HIT, HIT"
|    X-Cache-Hits  .................  "51, 1"
|    X-Timer  ......................  "S1471991729.021214,VS0,VE0"
|    Vary  .........................  "Accept-Encoding"

w-arr
|    0  ............................  "HTTP/1.1 200 OK"
|    Access-Control-Allow-Origin  ..  "*"
|    Cache-Control  ................  "stale-while-revalidate=86400, stale-while-error=86400"
|    Content-Type  .................  "text/html; charset=UTF-8"
|    Link  .........................  "; rel=\"https://api.w.org/\""
|    Server  .......................  "Apache"
|    Via
|    |    "1.1 varnish"
|    |    "1.1 varnish"
|    
|    Fastly-Debug-State  ...........  "HIT"
|    Fastly-Debug-Digest  ..........  "c245efbf14778c681ce317da114c1a762199e1326323d07b531d765e97fc8695"
|    Content-Length  ...............  "135495"
|    Accept-Ranges  ................  "bytes"
|    Date  .........................  "Tue, 23 Aug 2016 22:35:29 GMT"
|    Age  ..........................  "884"
|    Connection  ...................  "close"
|    X-Served-By  ..................  "cache-jfk8149-JFK, cache-den6021-DEN"
|    X-Cache  ......................  "HIT, HIT"
|    X-Cache-Hits  .................  "51, 1"
|    X-Timer  ......................  "S1471991729.173641,VS0,VE0"
|    Vary  .........................  "Accept-Encoding"

即使整理了钥匙,stream_get_meta_data也是冠军:

Sample Run Times #2

"get_headers": 99.51 // seconds
"stream_get": 43.79 // seconds

注意:这些测试是在廉价的共享服务器上运行的 - 因此测试时间差异很大。话虽如此,两种方法之间的差距在测试之间是高度一致的。

其他

对于那些了解PHP的c代码并且感觉他们可能从中获得一些见解的人,可以在以下位置找到函数定义:

'get_headers' (PHP Git)

'stream_get_meta_data' (PHP Git)

问题

  1. stream_get_meta_data相比,get_headers如何(在搜索和可用的代码段中)代表性不足?

    我的措辞方式导致意见,但我的意图更多的是:“是否有一些如此着名和可怕的stream_get_meta_data这往往会阻止人们使用它?“

  2. 与之前类似,是否有众所周知的行业同意两者之间的利弊?对CS有更全面理解的事情会提到。也许get_headers更安全/更健壮,更不容易受到服务器输出的不良影响和不一致?或者在get_headers产生和错误的情况下,工作可能知道stream_get_meta_data吗?

    根据我的发现,stream_get_meta_data确实有一对noteswarnings (... for fopen),但没有什么可怕的,以至于无法解决这些问题。

  3. 只要它安全且一致,我想将它合并到我的项目中,因为这个操作经常执行,并且将运行时间缩短一半会产生实质性的差异。

    编辑#1

    我已经找到一些使用get_headers成功的网址,但会针对stream_get_meta_data

    发出警告
    PHP Warning:  fopen(http://www.alealimay.com/): failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request
    
    PHP Warning:  fopen(http://www.thelovelist.net/): failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request
    
    PHP Warning:  fopen(http://www.bleedingcool.com/): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden
    

    get_headers仅返回403 Forbidden状态,即使您可以将网址粘贴到浏览器中并查看它们是工作网站。

    不确定:stream_get_meta_data的细分和get_headers的不完整标题(应包括所有重定向和最终status_code = 200的正常运行网站)。

    非常感谢,如果你已经做到这一点。

    此外,如果您投票不好,请发表评论,以便我可以改进这个问题,我们都可以为将来的案例学习。

0 个答案:

没有答案