我想编写一个与以下规则匹配的python正则表达式。
例如,“http://some.domain/a.zip”,“http://sub.some.domain/a.zip?key=value”符合此模式,“http://www.other.domain/a.zip”,“http://www.some.domain/a.zipp”不匹配。
答案 0 :(得分:3)
正如其他人在评论中所说的那样,最好使用URL解析器作为URL特别是可能会有很大差异,你可能会错过。但是,这是一个做我认为你想要的例子。
#!/usr/bin/python
import re
strings = [
"http://some.domain/",
"http://some.domain/a.zip",
"http://some.domain/a.tar",
"http://sub.some.domain/a.zip?key=value",
"http://www.other.domain/a.zip",
"http://www.some.domain/a.zipp0"
]
for url in strings:
# match "http://"
# match anything up to "some.domain/", greedy
# match "some.domain/"
# optionally, match anything up to .zip or .tar, greedy
# match ".tar" or ".zip", if above optional is present
# optionally, match a "?" after .zip/.tar, followed by anything, greedy
# match the end of string
if re.search(r'http://.*some\.domain/(.*\.(zip|tar)(\?.*)?)?$', url):
print("url: {} MATCHES".format(url))
else:
print("url: {} DOESN'T MATCH".format(url))
输出:
./url.py
url: http://some.domain/ MATCHES
url: http://some.domain/a.zip MATCHES
url: http://some.domain/a.tar MATCHES
url: http://sub.some.domain/a.zip?key=value MATCHES
url: http://www.other.domain/a.zip DOESN'T MATCH
url: http://www.some.domain/a.zipp0 DOESN'T MATCH
-stevieb
答案 1 :(得分:0)
^http:\/\/(?:\w+\.)?some\.domain(?:\/\w+\.(?:zip|tar))?(?:\?\w+=\w+)?$
import re
p = re.compile(ur'^http:\/\/(?:\w+\.)?some\.domain(?:\/\w+\.(?:zip|tar))?(?:\?\w+=\w+)?$', re.MULTILINE)
test_str = u"http://some.domain/a.zip\nhttp://sub.some.domain/a.zip?key=value\nhttp://www.other.domain/a.zip\nhttp://www.some.domain/a.zipp"
re.findall(p, test_str)