PHP:获取任何可用作String的地址的域(不带子域)

时间:2015-06-04 17:23:17

标签: php url dns

最近有一个问题,如何将任何URL的域作为字符串提供。

不幸的是这个问题已经关闭,到目前为止链接的答案只指向使用正则表达式的解决方案(对于像 .co.uk 这样的特殊情况而失败)和静态解决方案,考虑到这些例外情况(其中ofc。可能随时间而变化)。

所以,我正在寻找这个问题的通用解决方案,它可以随时工作并找到一个。 (至少有几项测试是积极的)

如果您发现某个尝试解决方案不起作用的域名,请随意提及,我也会尝试使用剪辑来覆盖该案例。

1 个答案:

答案 0 :(得分:3)

要查找给定的任何字符串的域,三步解决方案似乎效果最好:

我只进行了一些测试,看起来结果与预期一致。该方法直接生成输出,但可以修改以返回域名而不是生成输出:

<?php

getDomain("http://www.stackoverflow.com");
getDomain("http://www.google.co.uk");
getDomain("http://books.google.co.uk");
getDomain("http://a.b.c.google.co.uk");
getDomain("http://www.nominet.org.uk/intelligence/statistics/registration/");
getDomain("http://invalid.fail.pooo");
getDomain("http://AnotherOneThatShouldFail.com");


function getDomain($url){
  echo "Searching Domain for '".$url."': ";
  //Step 1: Get the actual hostname
  $url = parse_url($url);
  $actualHostname = $url["host"];

  //step 2: Top-Down approach: check DNS Records for the first valid A-record.
  //Re-Assemble url step-by-step, i.e. for www.google.co.uk, check: 
  // - uk
  // - co.uk
  // - google.co.uk (will match here)
  // - www.google.co.uk (will be skipped)

  $domainParts = explode(".", $actualHostname);
  for ($i= count($domainParts)-1; $i>=0; $i--){
    $domain = "";
    $currentCountry = null;
    for ($j = count($domainParts)-1; $j>=$i; $j--){
      $domain = $domainParts[$j] . "." . $domain;

      if ($currentCountry == null){
        $currentCountry = $domainParts[$j];
      }
    }
    $domain = trim($domain, ".");
    $validRecord = checkdnsrr($domain, "A"); //looking for Class A records

    if ($validRecord){
       //If the host can be resolved to an ip, it seems valid.
       //if hostname is returned, its invalid.  
       $hostIp = gethostbyname($domain);  
       $validRecord &= ($hostIp != $domain);

       if ($validRecord){
         //last check: DNS server might answer with one of ISPs default server ips for invalid domains.
         //perform a test on this by querying a domain of the same "country" that is invalid for sure to obtain an
         //ip list of ISPs default servers. Then compare with the response of current $domain.
         $validRecord &= !(in_array($hostIp, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentCountry)));
       }
    }

    //valid record?
    if ($validRecord){
      //return $domain;
      echo $domain."<br />";
      return;
    }
  }
  //return null;
  echo " not resolved.<br />";
}


?>

上述示例的输出:

Searching Domain for 'http://www.stackoverflow.com': stackoverflow.com
Searching Domain for 'http://www.google.co.uk': google.co.uk
Searching Domain for 'http://books.google.co.uk': google.co.uk
Searching Domain for 'http://a.b.c.google.co.uk': google.co.uk
Searching Domain for 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk
Searching Domain for 'http://invalid.fail.pooo': not resolved.
Searching Domain for 'http://AnotherOneThatShouldFail.com': not resolved.

这只是一组非常有限的测试用例,但我无法想象一个域没有A记录的情况。

作为一个不错的副作用,这也验证了网址,并且不仅仅依赖于理论上有效的格式,就像最后一个示例所示。

最好的, dognose