Question

我正在尝试将域名与其通用名称匹配。当我查看证书时，我看到通用名称为"*.example.com"。可能的域名可能是：

www.example.com  # A match the leftmost label of *.example.com
example.com  # A match the leftmost label of *.example.com
hello.example.com # A match for the leftmost label of *.example.com
foo.bar.example.com # Not a match for the leftmost label of 
*.*.* # Not a match for the leftmost label of *.example.com
www.*.com # Not a match for the leftmost label of *.example.com

* .example.com

我试图创建以下正则表达式：

import re
common_name = "*.example.com"
regex = common_name.replace('*','.*') + '$'
url = "foo.bar.example.com"
if re.match(regex, url):
   print "yes"
else:
   print "no"

我的正则表达式出了什么问题？

Answer 1

试试这个正则表达式：

(?:^|\s)(\w+\.)?example\.com(?:$|\s)

它应匹配

www.example.com
hello.example.com
example.com

基于您的测试字符串。

完整解决方案：

import re

common_name = "*.example.com"
rxString = r'(?:^|\s)(\w+\.)?' + common_name.replace('.', '\.')[3:] + '(?:$|\s)'

regex = re.compile(rxString)
url = "foo.bar.example.com"

if regex.match(url):
    print "yes"
else:
    print "no"

输入：

url                
-------------------
www.example.com    
example.com        
hello.example.com  
foo.bar.example.com
*.*.*              
www.*.com

输出：

url                  |  result
-------------------  |  -----------
www.example.com      |  yes
example.com          |  yes
hello.example.com    |  yes
foo.bar.example.com  |  no
*.*.*                |  no
www.*.com            |  no

Answer 2

将re.search与正则表达式'^[^.]*\.?example\.com$'一起使用：

>>> import re
>>> def check_match(url):
...     if re.search(r'^[^.]*\.?example\.com$', url):
...         print url
... 
>>> 
>>> check_match('www.example.com')
www.example.com
>>> check_match('example.com')
example.com
>>> check_match('hello.example.com')
hello.example.com
>>> check_match('foo.bar.example.com')
>>> check_match('*.*.*')
>>> check_match('www.*.com')
>>>

Answer 3

从正则表达式中排除.字符并允许任何其他字符，还必须添加https://的匹配项，以替换行：

regex = common_name.replace('*','.*') + '$'

到

regex = r'(https?://)?' + common_name.replace('*.', r'([^\.]*\.)?') + '$'

r'（https？：//）？' - 允许在网址开头匹配https://和http://

r'（[^。] *。）？' - 允许您的域名从*.开始，不包括重复.（域foo.bar.example.com - 将被视为无效）

通常，所提供的所有用例都将正确匹配。

Answer 4

这个怎么样（请注意，*不能在预期时按预期工作：

import re
common_name = "*.example.com"
# escaping the string to not contain any valid regex
common_name = re.escape(common_name)
# Replacing any occurences of the (regex-escaped) "*." with regex
regex = "^" + common_name.replace(r"\*\.", r"(\w*\.)?") + "$"
# yields the regex: ^(\w*\.)?example\.com$
url = "foo.bar.example.com"
if re.match(regex, url):
   print("yes")
else:
   print("no")

这符合预期的示例

Answer 5

这个正则表达式将处理大多数情况：

r'([^\.]+\.)?example\.com'

将其纳入代码：

import re

common_name = '*.example.com'
pattern = re.compile(common_name.replace('*.', r'([^\.]+\.)?', 1))

for domain in 'www.example.com', 'example.com', 'hello.example.com', 'foo.bar.example.com', '*.*.*', 'www.*.com':
    print('{}: {}'.format(domain, pattern.match(domain) is not None))

<强>输出

www.example.com: True
example.com: True
hello.example.com: True
foo.bar.example.com: False
*.*.*: False
www.*.com: False

是否应该接受example.com是有争议的，但上面的正则表达式会接受它。

URL常用名匹配 - Python

5 个答案: