正则表达式匹配以“/”

时间:2016-09-21 07:20:14

标签: javascript regex

我有两个网址,需要在域扩展后捕获字符串,如果它是一个两个字符的字符串,并以“/”结尾。到目前为止,我有这个:

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}\/)?/;



console.log(rgx.exec(t1));

console.log(rgx.exec(t2));

吐出来

[".net/", "net", undefined]
[".net/gb/", "net", "gb/"]

这是正确的,除了我不想捕获“gb /”,而是“gb”。有任何想法吗?我很困惑..

3 个答案:

答案 0 :(得分:6)

您可以使用的技术是在可选的非捕获组中使用捕获组:

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));

请参阅regex demo

/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/

谈到替代方法,这个正则表达式似乎更安全,因为它更精确:

^

请参阅this regex demo

<强>详情:

  • https?:\/\/ - 字符串开头
  • http:// - 协议部分(https://[^\/]+\.([a-z]+)\/
  • / - 域名部分匹配除.之外的一个或多个字符,然后[a-z]+,然后将TLD(1个或多个字母(?:([a-z]{2})\/)?)捕获到第1组
  • ([a-z]{2}) - 可选序列:
    • \/ - 第2组捕获2个小写ASCII字母
    • var t1 = "http://www.test.net/shop/test-3"; var t2 = "http://www.test.net/gb/shop/test-2"; console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1)); console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2)); - 斜线。

{{1}}

答案 1 :(得分:4)

另一种方法是在域扩展名之后解析第一个元素:

&#13;
&#13;
function parse(str){
    // Remove the domain extension and everything before that.
    // Then return the first section of the rest, before `/`
    return str.replace(/.+\.\w+\//, '')
              .split('/')[0];
}
console.log(parse("http://www.test.net/shop/test-3"));
console.log(parse("http://www.test.net/gb/shop/test-2"));
console.log(parse("http://www.test.net/nl"));
&#13;
&#13;
&#13;

这样,您可以轻松检查返回结果的长度。

正则表达式解释:

.+\.\w+\/
.+  - matches any character (except newline)
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\.  - matches the character . literally
\w+ - match any word character [a-zA-Z0-9_]
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/  - matches the character / literally

这个正则表达式基本上抓住了域扩展,域扩展本身以及它后面的/之前的所有内容。

答案 2 :(得分:0)

你可以简单地使用正斜杠作为lookahead,它不会像(?=\/)

那样将它放在捕获组中

编辑作为评论中提及的Evaldas Raisutis,如果两个字符是网址中的 last ,那么这两个字符将不匹配不是尾部斜杠,因此可以使用(?=\/|$)来匹配/ 行的末尾,从而使该部分成为可选的。这会将您的模式转换为

\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?

See in Regex101

&#13;
&#13;
var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
var t3 = "http://www.test.net/de/";
var t4 = "http://www.test.net/fr";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?/;

console.log(rgx.exec(t1));
console.log(rgx.exec(t2));
console.log(rgx.exec(t3));
console.log(rgx.exec(t4));
&#13;
&#13;
&#13;