Question

我有一个字符串：

link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

我有一个函数，该函数从该URL返回域名，或者如果找不到，则返回''：

def get_domain(url):
    domain_regex = re.compile("\:\/\/(.*?)\/|$")
    return re.findall(domain_regex, str(url))[0].replace('www.', '')

get_domain(link)

返回结果：

this_is_my_perfect_url.com

如果正则表达式不匹配，

|$返回''。

是否可以在正则表达式中实现默认值Error，所以我不必在函数内部进行任何检查？

因此，如果link = "there_is_no_domain_in_here"则函数返回Error而不是''。

Answer 1

如上面的评论中所述，您无法在regex中进行任何设置来为您执行此操作，但是您可以检查应用了额外格式后re.findall返回的输出是否为空，以及是否为空为空，表示未找到匹配项，返回Error

import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

def get_domain(url):
    domain_regex = re.compile("\:\/\/(.*?)\/|$")

    #Get regex matches into a list after data massaging
    matches = re.findall(domain_regex, str(url))[0].replace('www.', '')

    #Return the match or Error if output is empty
    return matches or 'Error'

print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))

输出将为

this_is_my_perfect_url.com
Error

Answer 2

仅将我的两分钱放进-懒惰的量词（.*?）与替代词（|$）结合使用是非常无效的。您可以vastly ameliorate your expression进行以下操作：

://[^/]+

此外，自Python 3.8起，您可以像下面一样使用海象运算符

if (m := re.search("://[^/]+", your_string)) is not None:
    # found sth.
else
    return "Error"

而且不行-仅使用正则表达式不可能。首先是一个不存在的字符串。

Answer 3

为什么不使用urlparse获取域？

# env python 2
# import urlparse
# python 3
from urllib.parse import urlparse


def get_domain(url):
    parsed_uri = urlparse(url)
    domain = parsed_uri.netloc
    return (domain, "ERROR")[domain is '']

url = 'there_is_no_domain_in_here'
print(get_domain(url))

如果正则表达式找不到匹配项，则返回“错误”

3 个答案: