我需要一个正则表达式,它会从每个示例中给出以下结果,我似乎无法做到正确:
example.com
收益率 - >没什么/空的
www.example.com
收益率 - >没什么/空的
account.example.com
收益率 - > account
mywww.example.com
收益率 - > mywww
wwwboys.example.com
收益率 - > wwwboys
cool-www.example.com
收益率 - > cool-www
因此,如果他们在子域中使用'www'并不重要,但它不能只是'www'。它也可以包含连字符。
答案 0 :(得分:1)
x="""example.com yields -> nothing / empty
www.example.com yields -> nothing / empty
account.example.com yields -> account
mywww.example.com yields -> mywww
wwwboys.example.com yields -> wwwboys
cool-www.example.com yields -> cool-www"""
>>> re.findall("^([A-Za-z0-9-]+)\.(?<!^www\.)[A-Za-z0-9-]+\.[A-Za-z]+",x,re.MULTILINE)
['account', 'mywww', 'wwwboys', 'cool-www']
答案 1 :(得分:1)
mystrings="""
example.com
www.example.com
account.example.com
mywww.example.com
wwwboys.example.com
cool-www.example.com
"""
junk=["example.com","www.example.com"]
for url in mystrings.split("\n"):
if url and not url.strip() in junk:
print "-->",url.split(".",2)[0]
输出
$ ./python.py
--> account
--> mywww
--> wwwboys
--> cool-www
答案 2 :(得分:0)
这是我的解决方案,基于ghostdog74的例子:
OFF_LIMITS = ('api', 'www', 'secure', 'account')
def get_safe_subdomain_or_none(host):
subdomain = None
L = host.split('.')
if len(L) is 3 and not L[0] in OFF_LIMITS: # 3 ensures that you don't have a sub-sub domain, and that you don't have just `example.com`
subdomain = L[0]
return subdomain