Question

我有两个网址，需要在域扩展后捕获字符串，如果它是一个两个字符的字符串，并以“/”结尾。到目前为止，我有这个：

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}\/)?/;



console.log(rgx.exec(t1));

console.log(rgx.exec(t2));

吐出来

[".net/", "net", undefined]
[".net/gb/", "net", "gb/"]

这是正确的，除了我不想捕获“gb /”，而是“gb”。有任何想法吗？我很困惑..

Answer 1

您可以使用的技术是在可选的非捕获组中使用捕获组：

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1));
console.log(/\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2));

请参阅regex demo

/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/

谈到替代方法，这个正则表达式似乎更安全，因为它更精确：

请参阅this regex demo

<强>详情：

https?:\/\/ - 字符串开头
http:// - 协议部分（https://或[^\/]+\.([a-z]+)\/）
/ - 域名部分匹配除.之外的一个或多个字符，然后[a-z]+，然后将TLD（1个或多个字母(?:([a-z]{2})\/)?）捕获到第1组
([a-z]{2}) - 可选序列：
- \/ - 第2组捕获2个小写ASCII字母
- var t1 = "http://www.test.net/shop/test-3"; var t2 = "http://www.test.net/gb/shop/test-2"; console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t1)); console.log(/^https?:\/\/[^\/]+\.([a-z]+)\/(?:([a-z]{2})\/)?/.exec(t2)); - 斜线。

{{1}}

Answer 2

另一种方法是在域扩展名之后解析第一个元素：

＆＃13;

function parse(str){
    // Remove the domain extension and everything before that.
    // Then return the first section of the rest, before `/`
    return str.replace(/.+\.\w+\//, '')
              .split('/')[0];
}
console.log(parse("http://www.test.net/shop/test-3"));
console.log(parse("http://www.test.net/gb/shop/test-2"));
console.log(parse("http://www.test.net/nl"));

＆＃13;

这样，您可以轻松检查返回结果的长度。

正则表达式解释：

.+\.\w+\/
.+  - matches any character (except newline)
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\.  - matches the character . literally
\w+ - match any word character [a-zA-Z0-9_]
          Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\/  - matches the character / literally

这个正则表达式基本上抓住了域扩展，域扩展本身以及它后面的/之前的所有内容。

Answer 3

你可以简单地使用正斜杠作为lookahead，它不会像(?=\/)

那样将它放在捕获组中

编辑作为评论中提及的Evaldas Raisutis，如果两个字符是网址中的 last ，那么这两个字符将不匹配不是尾部斜杠，因此可以使用(?=\/|$)来匹配/ 或行的末尾，从而使该部分成为可选的。这会将您的模式转换为

\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?

See in Regex101

＆＃13;

var t1 = "http://www.test.net/shop/test-3";
var t2 = "http://www.test.net/gb/shop/test-2";
var t3 = "http://www.test.net/de/";
var t4 = "http://www.test.net/fr";

var rgx = /\.([a-z]{0,3})\/([a-z]{2}(?=\/|$))?/;

console.log(rgx.exec(t1));
console.log(rgx.exec(t2));
console.log(rgx.exec(t3));
console.log(rgx.exec(t4));

＆＃13;

正则表达式匹配以“/”

3 个答案: