Question

我正在尝试编写一个Regex，它将URL的子域/域部分提取为单独的字符串。

我试过这个：

/^[^:]+:\/\/([^\.\/]+)(\.[^\.\/]+)+(?:\/|$)/

它应该适用于这些网址：

http;//www.mail.yahoo.co.uk/blah/blah

http;//test.test.again.mail.yahoo.com/blah/blah

我想把它分成这样的部分：

["http://", "www", ".mail", ".yahoo", ".co", ".uk"]

["http://", "test", ".test", ".again", ".mail", ".yahoo", ".com"]

现在我只能将它们捕获为：

["http://", "www", ".uk"]

["http://", "test", ".com"]

任何人都知道如何修复我的正则表达式？

Answer 1

您可以使用/(http[s]?:\/\/|\w+(?=\.)|\.\w+)/g。 Test it online

Answer 2

您可以使用正则表达式

(^\w+:\/\/)([^.]+)

匹配第一部分，然后使用

\.\w+

匹配第二部分

检查代码段

function getSubDomains(str){
    let result = str.match(/(^\w+:\/\/)([^.]+)/);
    result.splice(0, 1);
    result = result.concat(str.match(/\.\w+/g));
    console.log(result);
    return result;
}

getSubDomains('http://www.mail.yahoo.co.uk/blah/blah');
getSubDomains('http://test.test.again.mail.yahoo.com/blah/blah');

Answer 3

如何使用sticky flag y

来链接匹配

＆＃13;

var str = 'http://test.test.again.mail.yahoo.com/blah/blah';

var res = str.match(/^[a-z]+:\/\/|\.?[^/.\s]+/yig);

console.log(res);

＆＃13;

^[a-z]+:\/\/匹配协议：start，一个或多个a-z，后跟冒号和双斜杠。
|\.?[^/.\s]+或可选点后跟一个或多个chr that are not斜杠，点，空格。

See Regex101 demo for more explanation

正则表达式匹配URL的每个子域

3 个答案: