正则表达式允许`wwwboys.domain.com`的子域

时间:2010-03-10 04:23:33

标签: regex

我需要一个正则表达式,它会从每个示例中给出以下结果,我似乎无法做到正确:

example.com收益率 - >没什么/空的

www.example.com收益率 - >没什么/空的

account.example.com收益率 - > account

mywww.example.com收益率 - > mywww

wwwboys.example.com收益率 - > wwwboys

cool-www.example.com收益率 - > cool-www

因此,如果他们在子域中使用'www'并不重要,但它不能只是'www'。它也可以包含连字符。

3 个答案:

答案 0 :(得分:1)

x="""example.com yields -> nothing / empty

www.example.com yields -> nothing / empty

account.example.com yields -> account

mywww.example.com yields -> mywww

wwwboys.example.com yields -> wwwboys

cool-www.example.com yields -> cool-www"""

>>> re.findall("^([A-Za-z0-9-]+)\.(?<!^www\.)[A-Za-z0-9-]+\.[A-Za-z]+",x,re.MULTILINE)
['account', 'mywww', 'wwwboys', 'cool-www']

答案 1 :(得分:1)

mystrings="""
example.com
www.example.com
account.example.com
mywww.example.com
wwwboys.example.com
cool-www.example.com
"""

junk=["example.com","www.example.com"]
for url in mystrings.split("\n"):
    if url and not url.strip() in junk:
       print "-->",url.split(".",2)[0]

输出

$ ./python.py
--> account
--> mywww
--> wwwboys
--> cool-www

答案 2 :(得分:0)

这是我的解决方案,基于ghostdog74的例子:

OFF_LIMITS = ('api', 'www', 'secure', 'account')

def get_safe_subdomain_or_none(host):
    subdomain = None
    L = host.split('.')
    if len(L) is 3 and not L[0] in OFF_LIMITS:  # 3 ensures that you don't have a sub-sub domain, and that you don't have just `example.com`
        subdomain = L[0]
    return subdomain