如何正确解析HTTP头中的基本URL?

时间:2018-01-23 14:25:04

标签: parsing http-headers base-url

根据HTML and URLs

  

基本URL可以由HTTP标头给出(参见[RFC2068])。

我收到的标题是:

HTTP/1.1 301 Moved Permanently
Date: Tue, 23 Jan 2018 14:12:27 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips PHP/5.4.16
X-Powered-By: PHP/5.4.16
Location: https://www.meetangee.com/
Content-Length: 0
Content-Type: text/html; charset=UTF-8

基本网址是Location标头指定的网址吗?如果是这样,我应该在Location:之后解析字符直到行结束,还是有更简单的方法来获取基本URL?

编辑:Is there an HTTP header to say what base URL to use for relative links?的答案与我链接的引用相反,所以如果它是正确的(我不假设),那么我希望我链接的引用的某些来源不正确。

1 个答案:

答案 0 :(得分:1)

The base URL can be given by an HTTP header (see [RFC2068]).

That is true up to and including RFC 2616, but is no longer true starting with RFC 7231. See my answer to Is there an HTTP header to say what base URL to use for relative links? for details.

Is the base URL the URL specified by the Location header?

No. The Location header is used for a 3xx redirect instead. A User-Agent is meant to follow the redirect by sending a new request to the specified URL. Some redirects are temporary, in that the original URL should continue being used for future requests of the same entity. Some redirects are permanent, in that the original URL should no longer be used, replaced with the new URL.

If so, should I just parse the characters after Location: until end of line, or is there an even simpler way how to get the base URL?

An entity's base URL is the final non-redirect URL used to request the entity, unless specified otherwise by the entity itself (such as in a <base> element in HTML).