获取任何类型的URL格式的域名? - PHP到Ruby

时间:2011-11-16 10:27:16

标签: ruby-on-rails ruby ruby-on-rails-3 ruby-on-rails-3.1

我有一个PHP函数,可以为任何给定的url变体正确提取域名(没有任何子域名)。我是Ruby的新手,并且很难让它工作:

function get_domain_name( $url )
{
    eregi( "http[s]*://([a-zA-Z0-9.-]*)/?.*", $url, $domain );
    $domain = explode( ".", $domain[1] );

    if ( strlen( end($domain) ) == 2 && ( strlen($domain[count($domain)-2]) == 3 || strlen($domain[count($domain)-2]) == 2 )  )
    {
        # special case domains -- ex: co.uk .in .ca
        return strtolower( $domain[count($domain)-3] . "." . $domain[count($domain)-2] . "." . end( $domain ) );
    }
    else
    {
        # regular .com type domains -- three or more letters
        return strtolower( $domain[count($domain)-2] . "." . end( $domain ) );
    }
}

Rails中有什么东西可以做同样的事情吗?

更新

感谢@BenW

,这就是我最终的目标
 def extract_domain(url)
    if domain = url.match(/^(http:\/\/)*(www.)*([a-zA-Z0-9.-]*)\/?.*/i)
      domain = domain[3].split('.')
      if (domain.last.length == 2) && (domain[-2].length == 3 || domain[-2].length == 2)
        # special case domains -- ex: co.uk .in .ca
        domain[-3..-1].join('.')
      else
        # regular .com type domains -- three or more letters
        domain[-2..-1].join('.')
      end
    end
  end

它接受所有这些格式:

http://www2.google.com
www2.google.com
http://www.google.com
http://www.google.co.uk
www.google.com
google.co.uk
http://some.long.ass.subdomain.google.com

2 个答案:

答案 0 :(得分:6)

使用Addressable并利用ruby的String#slice

def domain_name(uri)
  Addressable::URI.heuristic_parse(uri, :scheme => "http") \
    .host[/\w+\.\w+(\.\w{2})?\Z/]
end

domain_name("stackoverflow.com") # => stackoverflow.com
domain_name("www.stackoverflow.com") # => stackoverflow.com
domain_name("http://stackoverflow.com") # => stackoverflow.com
domain_name("thing.com.au") # => thing.com.au
domain_name("some.thing.com.au") # => thing.com.au
domain_name("police.gov.uk") # => police.gov.uk

答案 1 :(得分:1)

afaik中没有内置任何内容 - 但直接端口很简单

 def extract_domain(url)
  require 'uri'
  domain = URI.parse(url).host.split('.')

  raise Exception.new("Invalid host") if domain.length < 2

  if (domain[-1].length == 2) and (domain[-2].length == 3 || domain[-2].length == 2)
    return domain[-3..-1].join('.')
  else
    return domain[-2..-1].join('.')
  end
end