我希望能够按照以下规则匹配域:
我已经拥有了java风格的正则表达式,我只需要这个python风格的
^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$
我找不到任何python正则表达式,因为每个人都希望使用urlparse。我不需要按域,端口,tld等拆分网址,我只需要做一个简单的域替换,所以正则表达式应该是我的解决方案
我做了什么:
expectedstring = re.sub(r"^((?!-)[A-Za-z0-9-]{1,63}(?<!-)\\.)+[A-Za-z]{2,6}$" , "XXX" , string)
示例字符串:
string = "This is why this domain example.com will never be the same after some years, it might just be example.co.uk but will never get to example.-com. Documents could be located in this specific location http://en.example.com/documents/print.doc as you probably already know."
expectedstring = "This is why this domain XXX will never be the same after some years, it might just be XXX but will never get to example.-com. Documents could be located in this specific location http://XXX/documents/print.doc as you probably already know."
有效域名列表
无效域名列表,以及原因。
答案 0 :(得分:0)
我根据给定域名列表(python 2.7x)运行测试:
import re
valid_domains = """
www.google.com
google.com
mkyong123.com
mkyong-info.com
sub.mkyong.com
sub.mkyong-info.com
mkyong.com.au
g.co
mkyong.t.t.co
"""
invalid_domains = """
mkyong.t.t.c
mkyong,com
mkyong
mkyong.123
.com
mkyong.com/users
-mkyong.com
mkyong-.com
sub.-mkyong.com
sub.mkyong-.com
"""
valid_names = valid_domains.split()
invalid_names = invalid_domains.split()
# match 1 character domain name or 2+ domain name
pattern = '^([A-Za-z0-9]\.|[A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9]\.){1,3}[A-Za-z]{2,6}$'
print 'checking valid domain names ============'
for name in valid_names:
print name.ljust(50), ('True' if re.match(pattern, name) else 'False').rjust(5)
print '\nchecking invalid domain names ============'
for name in invalid_names:
print name.ljust(50), ('True' if re.match(pattern, name) else 'False').rjust(5)
输出:
checking valid domain names ============
www.google.com True
google.com True
mkyong123.com True
mkyong-info.com True
sub.mkyong.com True
sub.mkyong-info.com True
mkyong.com.au True
g.co True
mkyong.t.t.co True
checking invalid domain names ============
mkyong.t.t.c False
mkyong,com False
mkyong False
mkyong.123 False
.com False
mkyong.com/users False
-mkyong.com False
mkyong-.com False
sub.-mkyong.com False
sub.mkyong-.com False
[编辑]为了获得与所提供的expectedstring相同的结果,我想出了以下方法(没有检查&#34; http(s)&#34;):
import re
# match 1 character domain name or 2+ domain name
pattern = '(//|\s+|^)(\w\.|\w[A-Za-z0-9-]{0,61}\w\.){1,3}[A-Za-z]{2,6}'
string = "This is why this domain example.com will never be the same after some years, it might just be example.co.uk but will never get to example.-com. Documents could be located in this specific location http://en.example.com/documents/print.doc as you probably already know."
expectedstring = "This is why this domain XXX will never be the same after some years, it might just be XXX but will never get to example.-com. Documents could be located in this specific location http://XXX/documents/print.doc as you probably already know."
resultstring = ''.join([re.sub(pattern , "\g<1>XXX" , string)])
print 'resultstring: \n', resultstring
print '\nare they equal? ', expectedstring == resultstring
输出是:
resultstring:
This is why this domain XXX will never be the same after some years, it might just be XXX but will never get to example.-com. Documents could be located in this specific location http://XXX/documents/print.doc as you probably already know.
are they equal? True