我正在尝试用正则表达式解决字符串匹配问题。我需要匹配此表单的网址:
http://soundcloud.com/okapi23/dont-turn-your-back/
我需要“拒绝”此表单的网址:
http://soundcloud.com/okapi23/sets/happily-reversed/
尾随'/'显然是可选的。
基本上是这样的:
到目前为止,我提出的是http(s)?://(www\.)?soundcloud\.com/.+/(?!sets)\b(/.+)?
,但失败了。
有什么建议吗?是否有任何库可以简化任务(例如,使尾部斜杠可选)?
答案 0 :(得分:5)
假设OP想要测试以查看给定字符串是否包含满足以下要求的URL:
http:
或https:
。//soundcloud.com
或//www.soundcloud.com
。"sets"
。[A-Za-z0-9]
),并且多个单词由一个短划线或下划线分隔。"/"
结尾。这是一个经过测试的JavaScript函数(带有完全注释的正则表达式),可以解决这个问题:
function isValidCustomUrl(text) {
/* Here is the regex commented in free-spacing mode:
# Match specific URL having non-"sets" 2nd path segment.
^ # Anchor to start of string.
https?: # URL Scheme (http or https).
// # Begin URL Authority.
(?:www\.)? # Optional www subdomain.
soundcloud\.com # URL DNS domain.
/ # 1st path segment (can be: "sets").
[A-Za-z0-9]+ # 1st word-portion (required).
(?: # Zero or more extra word portions.
[-_] # only if separated by one - or _.
[A-Za-z0-9]+ # Additional word-portion.
)* # Zero or more extra word portions.
(?!/sets(?:/|$)) # Assert 2nd segment not "sets".
(?: # 2nd and 3rd path segments.
/ # Additional path segment.
[A-Za-z0-9]+ # 1st word-portion.
(?: # Zero or more extra word portions.
[-_] # only if separated by one - or _.
[A-Za-z0-9]+ # Additional word-portion.
)* # Zero or more extra word portions.
){1,2} # 2nd path segment required, 3rd optional.
/? # URL may end with optional /.
$ # Anchor to end of string.
*/
// Same regex in javascript syntax:
var re = /^https?:\/\/(?:www\.)?soundcloud\.com\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*(?!\/sets(?:\/|$))(?:\/[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*){1,2}\/?$/i;
if (re.test(text)) return true;
return false;
}
答案 1 :(得分:4)
而不是.
使用[a-zA-Z][\w-]*
,这意味着“匹配一个字母,后跟任意数量的字母,数字,下划线或连字符”。
^https?://(www\.)?soundcloud\.com/[a-zA-Z][\w-]*/(?!sets(/|$))[a-zA-Z][\w-]*(/[a-zA-Z][\w-]*)?/?$
要获取可选的尾部斜杠,请使用/?$
。
在Javascript正则表达式文字中,必须转义所有正斜杠。
答案 2 :(得分:1)
我建议你使用正则表达式
^https?:\/\/soundcloud\.com(?!\/[^\/]+\/sets(?:\/|$))(?:\/[^\/]+){2,3}\/?$