Question

我正在尝试解析PHP中的URL，其中输入可以是以下任何一种：

代码：

$info = parse_url('http://www.domainname.com/');
print_r($info);

$info = parse_url('www.domain.com');
print_r($info);

$info = parse_url('/test/');
print_r($info);

$info = parse_url('test.php');
print_r($info);

返回：

Array
(
    [scheme] => http
    [host] => www.domainname.com
    [path] => /
)
Array
(
    [path] => www.domain.com
)
Array
(
    [path] => /test/
)
Array
(
    [path] => test.php
)

您可以看到的问题是将域作为路径返回的第二个示例。

Answer 1

这给出了正确的结果，但文件需要以斜杠开头：

parse('http://www.domainname.com/');
parse('www.domain.com');
parse('/test/');
parse("/file.php");

function parse($url){
    if(strpos($url,"://")===false && substr($url,0,1)!="/") $url = "http://".$url;
    $info = parse_url($url);
    if($info)
    print_r($info);
}

结果是：

Array
(
    [scheme] => http
    [host] => www.domainname.com
    [path] => /
)
Array
(
    [scheme] => http
    [host] => www.domain.com
)
Array
(
    [path] => /test/
)
Array
(
    [path] => /file.php
)

Answer 2

To handle a URL in a way that preserves that it is was a schema-less URL, whilst also allowing a domain to be identified, use the following code.

if (!preg_match('/^([a-z][a-z0-9\-\.\+]*:)|(\/)/', $url)) {
    $url = '//' . $url;
}

So this will apply "//" to beginning of the URL only if the URL does not have a valid scheme and does not begin with "/".

Some quick background on this:

The parser assumes (valid) characters before ":" is the schema, whilst characters following "//" is the domain. To indicate the URL has both a scheme and domain, the two markers must be used consecutively, "://". For example

[scheme]:[path//path]
//[domain][/path]
[scheme]://[domain][/path]
[/path]
[path]

This is how PHP parses URLs with parse_url() but I couldn't say if it's to standard.

The rules for a valid scheme name is: alpha *( alpha | digit | "+" | "-" | "." )

PHP Parse URL - 当协议前缀不存在时，域作为路径返回

2 个答案: