我有两个网址,需要在域扩展后捕获字符串,如果它是一个两个字符的字符串,并以“/”结尾。到目前为止,我有这个:
var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
var rgx = /\.([a-z]{0,3})\/([a-z]{2}\/)?/;
console.log(rgx.exec(t1));
console.log(rgx.exec(t2));
吐出来
[".net/", "net", undefined]
[".net/gb/", "net", "gb/"]
这是正确的,除了我不想捕获“gb /”,而是“gb”。有任何想法吗?我很困惑..
答案 0 :(得分:6)
您可以使用的技术是在可选的非捕获组中使用捕获组:
var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));
请参阅regex demo
/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/
谈到替代方法,这个正则表达式似乎更安全,因为它更精确:
^
<强>详情:
https?:\/\/
- 字符串开头http://
- 协议部分(https://
或[^\/]+\.([a-z]+)\/
)/
- 域名部分匹配除.
之外的一个或多个字符,然后[a-z]+
,然后将TLD(1个或多个字母(?:([a-z]{2})\/)?
)捕获到第1组([a-z]{2})
- 可选序列:
\/
- 第2组捕获2个小写ASCII字母var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));
- 斜线。
{{1}}
答案 1 :(得分:4)
另一种方法是在域扩展名之后解析第一个元素:
function parse(str){
// Remove the domain extension and everything before that.
// Then return the first section of the rest, before `/`
return str.replace(/.+\.\w+\//, '')
.split('/')[0];
}
console.log(parse("http://www.test.net/shop/test-3"));
console.log(parse("http://www.test.net/gb/shop/test-2"));
console.log(parse("http://www.test.net/nl"));
&#13;
这样,您可以轻松检查返回结果的长度。
正则表达式解释:
.+\.\w+\/
.+ - matches any character (except newline)
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\. - matches the character . literally
\w+ - match any word character [a-zA-Z0-9_]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/ - matches the character / literally
这个正则表达式基本上抓住了域扩展,域扩展本身以及它后面的/
之前的所有内容。
答案 2 :(得分:0)
你可以简单地使用正斜杠作为lookahead,它不会像(?=\/)
编辑作为评论中提及的Evaldas Raisutis,如果两个字符是网址中的 last ,那么这两个字符将不匹配不是尾部斜杠,因此可以使用(?=\/|$)
来匹配/
或行的末尾,从而使该部分成为可选的。这会将您的模式转换为
\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?
var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
var t3 = "http://www.test.net/de/";
var t4 = "http://www.test.net/fr";
var rgx = /\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?/;
console.log(rgx.exec(t1));
console.log(rgx.exec(t2));
console.log(rgx.exec(t3));
console.log(rgx.exec(t4));
&#13;