从url获取其子域名的域名

时间:2015-05-13 19:50:45

标签: php regex subdomain

我使用此功能从字符串中获取域和子域。但是如果string已经是我期望的格式,则返回null

function getDomainFromUrl($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return preg_replace('/^www\./', '', $host);
}

$url = "http://abc.example.com/" -> abc.example.com | OK

$url = "http://www.example.com/" -> example.com | OK

$url = "abc.example.com" -> FAILS!

4 个答案:

答案 0 :(得分:3)

这是因为abc.example.com不是PHP_URL_HOST所以你需要首先检查它是否是第一个。所以你应该做这样简单的事情,如果网址没有协议 - >添加它:

function addhttp($url) {
    if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
        $url = "http://" . $url;
    }
    return $url;
}

function getDomainFromUrl($url) {
    $host = parse_url($url, PHP_URL_HOST);
    if($host){
        return preg_replace('/^www\./', '', $host);
    }else{
        //not a url with protocol
        $url = addhttp($url); //add protocol
        return getDomainFromUrl($url); //run function again.
    }
}

答案 1 :(得分:3)

这是一个纯正的正则表达式解决方案:

function getDomainFromUrl($url) {
    if (preg_match('/^(?:https?:\/\/)?(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?(?:www\.)?([^\/:]+)/', $url, $parts)) {
        return $parts[1];
    }
    return false; // or maybe '', depending on what you need
}

getDomainFromUrl("http://abc.example.com/"); // abc.example.com

getDomainFromUrl("http://www.example.com/"); // example.com

getDomainFromUrl("abc.example.com");         // abc.example.com

getDomainFromUrl("username@abc.example.com"); // abc.example.com

getDomainFromUrl("https://username:password@abc.example.com"); // abc.example.com

getDomainFromUrl("https://username:password@abc.example.com:123"); // abc.example.com

你可以在这里试试: http://sandbox.onlinephpfunctions.com/code/3f0343bbb68b190bffff5d568470681c00b0c45c

如果您想了解有关正则表达式的更多信息:

^                 matching must start from the beginning on the string
(?:https?:\/\/)?  an optional, non-capturing group that matches http:// and https://

(?:(?:[^@]*@)|(?:[^:]*:[^@]*@))?
                  an optional, non-capturing group that matches either *@ or *:*@ where * is any character
(?:www\.)?        an optional, non-capturing group that matches www.
([^\/:]+)          a capturing group that matches anything up until a '/', a ':', or the end of the string

答案 2 :(得分:0)

问题是parse_url返回false。检查以确保在尝试使用之前得到回复,否则$host为空。

<?php
function getDomainFromUrl($url) {
    $host = (parse_url($url, PHP_URL_HOST) != '') ? parse_url($url, PHP_URL_HOST) : $url;
    return preg_replace('/^www\./', '', $host);
}
echo getDomainFromUrl("http://abc.example.com/") . "\n";
echo getDomainFromUrl("http://www.example.com/") . "\n";
echo getDomainFromUrl("abc.example.com");

输出:

  

abc.example.com
  example.com
  abc.example.com

答案 3 :(得分:0)

parse_url()函数不适用于相对URL。您可以测试是否存在sheme,如果没有,则添加默认值:

if ( !preg_match( '/^([^\:]+)\:\/\//', $url ) ) $url = 'http://' . $url;