最近有一个问题,如何将任何URL的域作为字符串提供。
不幸的是这个问题已经关闭,到目前为止链接的答案只指向使用正则表达式的解决方案(对于像 .co.uk 这样的特殊情况而失败)和静态解决方案,考虑到这些例外情况(其中ofc。可能随时间而变化)。
所以,我正在寻找这个问题的通用解决方案,它可以随时工作并找到一个。 (至少有几项测试是积极的)
如果您发现某个尝试解决方案不起作用的域名,请随意提及,我也会尝试使用剪辑来覆盖该案例。
答案 0 :(得分:3)
要查找给定的任何字符串的域,三步解决方案似乎效果最好:
parse_url
(http://php.net/manual/en/function.parse-url.php)checkdnsrr
:http://php.net/manual/en/function.checkdnsrr.php)我只进行了一些测试,看起来结果与预期一致。该方法直接生成输出,但可以修改以返回域名而不是生成输出:
<?php
getDomain("http://www.stackoverflow.com");
getDomain("http://www.google.co.uk");
getDomain("http://books.google.co.uk");
getDomain("http://a.b.c.google.co.uk");
getDomain("http://www.nominet.org.uk/intelligence/statistics/registration/");
getDomain("http://invalid.fail.pooo");
getDomain("http://AnotherOneThatShouldFail.com");
function getDomain($url){
echo "Searching Domain for '".$url."': ";
//Step 1: Get the actual hostname
$url = parse_url($url);
$actualHostname = $url["host"];
//step 2: Top-Down approach: check DNS Records for the first valid A-record.
//Re-Assemble url step-by-step, i.e. for www.google.co.uk, check:
// - uk
// - co.uk
// - google.co.uk (will match here)
// - www.google.co.uk (will be skipped)
$domainParts = explode(".", $actualHostname);
for ($i= count($domainParts)-1; $i>=0; $i--){
$domain = "";
$currentCountry = null;
for ($j = count($domainParts)-1; $j>=$i; $j--){
$domain = $domainParts[$j] . "." . $domain;
if ($currentCountry == null){
$currentCountry = $domainParts[$j];
}
}
$domain = trim($domain, ".");
$validRecord = checkdnsrr($domain, "A"); //looking for Class A records
if ($validRecord){
//If the host can be resolved to an ip, it seems valid.
//if hostname is returned, its invalid.
$hostIp = gethostbyname($domain);
$validRecord &= ($hostIp != $domain);
if ($validRecord){
//last check: DNS server might answer with one of ISPs default server ips for invalid domains.
//perform a test on this by querying a domain of the same "country" that is invalid for sure to obtain an
//ip list of ISPs default servers. Then compare with the response of current $domain.
$validRecord &= !(in_array($hostIp, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentCountry)));
}
}
//valid record?
if ($validRecord){
//return $domain;
echo $domain."<br />";
return;
}
}
//return null;
echo " not resolved.<br />";
}
?>
上述示例的输出:
Searching Domain for 'http://www.stackoverflow.com': stackoverflow.com
Searching Domain for 'http://www.google.co.uk': google.co.uk
Searching Domain for 'http://books.google.co.uk': google.co.uk
Searching Domain for 'http://a.b.c.google.co.uk': google.co.uk
Searching Domain for 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk
Searching Domain for 'http://invalid.fail.pooo': not resolved.
Searching Domain for 'http://AnotherOneThatShouldFail.com': not resolved.
这只是一组非常有限的测试用例,但我无法想象一个域没有A记录的情况。
作为一个不错的副作用,这也验证了网址,并且不仅仅依赖于理论上有效的格式,就像最后一个示例所示。
最好的, dognose