Question

我正在尝试在python中编写一个正则表达式，该正则表达式将匹配URL（例如https://www.foo.com/）或以“ sc-domain：”开头但不具有https或路径。

例如，以下条目应通过

https://www.foo.com/
https://www.foo.com/bar/
sc-domain:www.foo.com

但是以下条目应该失败

htps://www.foo.com/
https:/www.foo.com/bar/
sc-domain:www.foo.com/
sc-domain:www.foo.com/bar
scdomain:www.foo.com

现在我正在处理以下内容：

^(https://*/|sc-domain:^[^/]*$)

这几乎可行，但仍允许诸如sc-domain：www.foo.com/之类的提交通过。具体来说，^[^/]*$部分并未捕获到'/'不应该通过。

Answer 1

^((?:https://\S+)|(?:sc-domain:[^/\s]+))$

您可以尝试一下。

请参阅演示。

https://regex101.com/r/xXSayK/2

Answer 2

您可以使用此正则表达式，

^(?:https?://www\.foo\.com(?:/\S*)*|sc-domain:www\.foo\.com)$

说明：

^-行首
(?:-非组开始轮换
https?://www\.foo\.com(?:/\S*)*-这匹配一个以http：//或https：//开头的URL，后跟www.foo.com，并可选地后面是使用
的路径
|-以sc-domain开头的字符串的替换：
sc-domain:www\.foo\.com-此部分开始与sc-domain匹配：其后为www.foo.com，并且进一步不允许任何文件路径
)$-非分组模式的关闭和字符串的结尾。

Regex Demo

另外，有点不确定是否要允许任何随机域，但是如果要允许，可以使用此正则表达式，

^(?:https?://(?:\w+\.)+\w+(?:/\S*)*|sc-domain:(?:\w+\.)+\w+)$

Regex Demo allowing any domain

Answer 3

This expression还将使用两个简单的捕获组来做到这一点，您可以根据需要对其进行修改：

^((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$

我还添加了http，如果不需要的话可以将其删除。

JavaScript测试

const regex = /^(((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com))$/gm;
const str = `https://www.foo.com/
https://www.foo.com/bar/
sc-domain:www.foo.com
http://www.foo.com/
http://www.foo.com/bar/
`;
const subst = `$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

使用Python测试

您可以简单地使用Python进行测试并添加所需的捕获组：

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$"

test_str = ("https://www.foo.com/\n"
    "https://www.foo.com/bar/\n"
    "sc-domain:www.foo.com\n"
    "http://www.foo.com/\n"
    "http://www.foo.com/bar/\n\n"
    "htps://www.foo.com/\n"
    "https:/www.foo.com/bar/\n"
    "sc-domain:www.foo.com/\n"
    "sc-domain:www.foo.com/bar\n"
    "scdomain:www.foo.com")

subst = "$1 $2"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

编辑

根据Pushpesh的建议，您可以使用环视并将其简化为：

^((https?)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$

正则表达式，用于匹配特定的URL

3 个答案:

JavaScript测试

使用Python测试

编辑