Question

我想编写一个与以下规则匹配的python正则表达式。

以“http：//”
在域部分中，以“some.domain”结尾
在路径部分中，使用“tar”，“zip”
网址可能包含可选的查询部分，例如“？key1 = value1＆amp; key2 = value2”

例如，“http://some.domain/a.zip”，“http://sub.some.domain/a.zip?key=value”符合此模式，“http://www.other.domain/a.zip”，“http://www.some.domain/a.zipp”不匹配。

Answer 1

正如其他人在评论中所说的那样，最好使用URL解析器作为URL特别是可能会有很大差异，你可能会错过。但是，这是一个做我认为你想要的例子。

#!/usr/bin/python

import re

strings = [ 
            "http://some.domain/", 
            "http://some.domain/a.zip", 
            "http://some.domain/a.tar",
            "http://sub.some.domain/a.zip?key=value", 
            "http://www.other.domain/a.zip", 
            "http://www.some.domain/a.zipp0"
            ]

for url in strings:

    # match "http://"
    # match anything up to "some.domain/", greedy
    # match "some.domain/"
    # optionally, match anything up to .zip or .tar, greedy
    # match ".tar" or ".zip", if above optional is present
    # optionally, match a "?" after .zip/.tar, followed by anything, greedy
    # match the end of string

    if re.search(r'http://.*some\.domain/(.*\.(zip|tar)(\?.*)?)?$', url):
        print("url: {} MATCHES".format(url))
    else:
        print("url: {} DOESN'T MATCH".format(url))

输出：

./url.py
url: http://some.domain/ MATCHES
url: http://some.domain/a.zip MATCHES
url: http://some.domain/a.tar MATCHES
url: http://sub.some.domain/a.zip?key=value MATCHES
url: http://www.other.domain/a.zip DOESN'T MATCH
url: http://www.some.domain/a.zipp0 DOESN'T MATCH

-stevieb

Answer 2

^http:\/\/(?:\w+\.)?some\.domain(?:\/\w+\.(?:zip|tar))?(?:\?\w+=\w+)?$

import re
p = re.compile(ur'^http:\/\/(?:\w+\.)?some\.domain(?:\/\w+\.(?:zip|tar))?(?:\?\w+=\w+)?$', re.MULTILINE)
test_str = u"http://some.domain/a.zip\nhttp://sub.some.domain/a.zip?key=value\nhttp://www.other.domain/a.zip\nhttp://www.some.domain/a.zipp"

re.findall(p, test_str)

DEMO

如何在Python

2 个答案: