Question

如何拆分将返回名称和扩展名的域名

Answer 1

哇，这里有很多不好的答案。如果您知道公共后缀列表中的内容，则仅可以执行此操作。如果您使用的是split或正则表达式或其他内容，那么您就错了。

幸运的是，这是python，并且有一个库：https://pypi.python.org/pypi/tldextract

从他们的自述文件中可以看出：

>>> import tldextract
>>> tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')

ExtractResult是一个名字小组。这很容易。

使用这样的库的好处是，它们将跟上公共后缀列表的添加，因此您不必这样做。

Answer 2

根据您的应用程序，请稍微关注最后一个'。'之后的部分。这适用于.com，.net，.org等，但很可能会落后于许多County Code TLD。例如。 bit.ly或google.co.uk。

（我的意思是'bit.ly'可能更喜欢被识别包括 .ly TLD而google可能不希望被识别为虚假.co rest。这是否重要显然取决于你正在做什么）。

在那些复杂的案件中......好吧，我怀疑你的工作已被裁掉了！

一个强有力的答案可能取决于您如何收集/存储您的域名以及您真正想要的内容作为“名称”。

例如，如果您有一组域名，没有子域信息，那么您可以执行与上面建议相反的操作，只需关闭第一部分：

>>> "stackoverflow.com".split('.')[0]
'stackoverflow'

Answer 3

domain = 'subdomain.domain.ext'
name, ext = domain.split('.')[-2:]

Answer 4

您的意思是互联网域名，例如www.stackoverflow.com？如果是这样，那么只需使用：

>>> 'www.stackoverflow.com'.rsplit('.', 1)
['www.stackoverflow', 'com']

Answer 5

我猜您会发现urlparse模块很有趣：http://docs.python.org/library/urlparse.html

Answer 6

通常，要弄清楚用户注册位结束和注册表位开始的位置并不容易。例如：a.com，b.co.uk，c.us，d.ca.us，e.uk.com，f.pvt.k12.wy.us ...

Mozilla的好人有一个专门用于列出域名后缀的项目，公众可以在其中注册域名：http://publicsuffix.org/

Answer 7

正如其他评论者所指出的那样，除了拥有动态更新的TLD和gTLD列表之外，尚无万无一失的方法。适用于google.com的内容可能不适用于google.co.uk或something.co.xx或something.com.xx。 in a TLD or a gTLD几乎可以算是什么，谁知道未来会怎样？

因此，有两种非常不同的方法：

使用具有定期更新的TLD和gTLD列表的库，例如tldextract。
使用一种您知道会在某些极端情况下失败的算法，但目标是尽可能减少这种情况。

根据我的经验，假设您已经剥离了协议和路径，以下内容将满足＃2：

def domain_from_hostname( hostname ):
    # Assume any TLD business will be dealt with in the last 7 bytes
    # This won't work for:
    # - Subdomains on supershort domains with short TLDs
    # - Domains with TLDs over 7 bytes long (".diamonds" anyone?) 
    if len(host) < 8:
        return host.strip( 'www.' )

    host_left = host[:-7].split('.')[-1]
    return u'%s%s' % ( host_left, host[-7:] )

尝试一些奇怪的方法：.com.au，.media，.in，.中信等。

Answer 8

如果您想要获取域名的最后一部分，可以：

subdomain, _, domain= fqdn.rpartition('.')

Answer 9

这是我想出的。没有什么花哨。这个对我有用。尽管我确实相信，当存在诸如？，+之类的字符时，有时会给出奇怪的反馈。还是不明白为什么。

scheme = 'https://www.msn.com/d/1234.php?=https://www.msn.com?+'
notfound = -1
https = scheme.rfind('http://')
com = scheme.rfind('.com')
if https != notfound:
    if com != notfound:
        domain = scheme[https:com+len('.com')]
        return scheme[https:com+len('.com')]

#Here we can grab the double suffix. This one has not been fully tested.

def getdoublesuffix(domain):
    '''
    :description: returns double dot TLD suffix endings or returns -1
    :function: 
    '''
    # ['www.domain.co.com'] to
    # ['www.domain', 'co', 'com']
    dots = domain.rsplit(sep='.', maxsplit=2)
# count dots by enumeration not string count! Only interested in enumeration count and
# not total dot count since it is split by '.' as a separator.
    for number, value in enumerate(dots, 0):
        value = value
        number = number
    if number is 2:
        # co.com
        result = '{0}.{1}'.format(dots[1], dots[2])
        return result
    else:
        #return that we do not have a domain ending in two dot notation.
        return -1

python域名拆分名称和扩展名

9 个答案: