Question

我使用file_get_contents来获取远程页面。许多页面返回404错误，具有自定义（并且重404页）

找到404标题时，是否有办法停止并且不下载整个页面？

（也许curl或wget可以做到这一点？）

Answer 1

不，这是不可能的。

HTTP为条件请求（例如If-Modified-Since）提供了一些范围，但没有触发状态代码。

您最接近的可能是发出HEAD请求，然后，如果您没有收到错误代码，请在之后发出GET请求。对于每一个好的资源，你可能会因为没有获得不良资源而获得两个请求，这可能会失去更多。

Answer 2

我会做以下事情：

$pageUrl = "http://www.example.com/myfile/which/may/not.exist";
$headers = get_headers($pageUrl);
//check header before downloading
if($headers[0] == "HTTP/1.1 200 OK"){
  //OK - download
  $download = file_get_contents($pageUrl);
}else if($headers[0] == "HTTP/1.1 404 NOT FOUND"){
  //NOT OK - show error
}

你也可以改为做索引。

基于PHPs manual page for get_headers

示例输出：

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Sat, 29 May 2004 12:28:13 GMT
    [2] => Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
    [4] => ETag: "3f80f-1b6-3e1cb03b"
    [5] => Accept-Ranges: bytes
    [6] => Content-Length: 438
    [7] => Connection: close
    [8] => Content-Type: text/html
)

发现404时，请勿下载响应正文

2 个答案: