使用正则表达式accur'undefined'分割字符串

时间:2018-12-21 03:40:21

标签: javascript regex split undefined

我希望从URL中提取以下字段,例如协议,域名,端口和路径。

我知道此split功能对我很有帮助。这是我的代码

"https://www.test.com:8081/a/b/c".split(/(:\/\/)|(:)|(\/)/)

结果是

["https", "://", undefined, undefined, "www.test.com", undefined, ":", undefined, "8081", undefined, undefined, "/", "a", undefined, undefined, "/", "b", undefined, undefined, "/", "c"]

我希望结果是

['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']

为什么会发生undefined?如何更正我的正则表达式?

3 个答案:

答案 0 :(得分:1)

将捕获组放在正则表达式中时,结果将包括与每个组匹配的条目。由于您的组处于不同的替代方案中,因此当一个替代方案匹配时,将不使用其他替代方案,因此结果中的对应元素将为undefined

不要将组放在每个替代方案中,而是将组包装在所有替代方案周围。

console.log("https://www.test.com:8081/a/b/c".split(/(:\/\/|:|\/)/));

答案 1 :(得分:1)

还有另一种使用URL对象提取参数的方法

var url = new URL('https://www.test.com:8081/a/b/c');
console.log(url.protocol);
console.log(url.hostname);
console.log(url.port);
console.log(url.pathname);

答案 2 :(得分:1)

捕获组当然包含在split的结果中-当您替换在特定迭代中不匹配的捕获组时,该捕获组将不会已经匹配,但是它仍然是split内的捕获组,因此undefined被添加到该位置的数组中。例如:

console.log('abc'.split(/b|(wontmatch)/));

// a more complicated example:

console.log('abcde'.split(/(b)|(d)/));

/*
[
  "a",        split substring
  "b",        b was captured, so it's included in the match
  undefined,  the (d) part did not match, but it's another capturing group, so "undefined"
  "c",        split substring
  undefined,  the (b) part did not match, but it's another capturing group, so "undefined"
  "d",        d was captured, so it's included in the match
  "e"         split substring
]
*/

您遇到的行为只是上述行为的一个更复杂的版本。

您可能会考虑使用match而不是split,这可能会更容易理解:

const str = "https://www.test.com:8081/a/b/c";
const matches = str.match(/([^:]+)(:\/\/)([^:]+)(:)(\d+)(\/)(.*$)/);
console.log(matches);

// I expect the result is
// ['https', '://', 'www.test.com', ':', '8081', '/', 'a/b/c']

或者,如果仅 想要协议,域名,端口和路径,则删除无用的捕获组:

const str = "https://www.test.com:8081/a/b/c";
const [, protocol, domain, port, path] = str.match(
  /([^:]+):\/\/([^:]+):(\d+)\/(.*$)/
);
console.log(protocol, domain, port, path);

如果端口是可选的,则将其和前面的:放入可选的非捕获组,并将第二个字符集更改为[^:/]以确保其与斜杠不匹配:

const str = "https://www.test.com/a/b/c";
const [, protocol, domain, port, path] = str.match(
  /([^:]+):\/\/([^:/]+)(?::(\d+))?\/(.*$)/
);
console.log(protocol, domain, port, path);