Question

我正在尝试从字符串中获取基本URL（所以没有window.location）。

它需要删除斜杠
它必须是正则表达式（没有新的URL）
它需要使用查询参数和锚链接

换句话说，以下所有内容应为最后一个返回https://apple.com或https://www.apple.com。

https://apple.com?query=true&slash=false
https://apple.com#anchor=true&slash=false
http://www.apple.com/#anchor=true&slash=true&whatever=foo

这些仅仅是示例，URL可以具有不同的子域，例如https://shop.apple.co.uk/?query=foo应该返回https://shop.apple.co.uk-可以是任何URL，例如：https://foo.bar

我离得越近：

const baseUrl = url.replace(/^((\w+:)?\/\/[^\/]+\/?).*$/,'$1').replace(/\/$/, ""); // Base Path & Trailing slash

但这不适用于锚链接和查询，这些链接和查询从url之后开始，而没有/之前

有什么想法我可以让它在所有情况下都能正常工作吗？

Answer 1

您可以将#和?添加到negated character class中。您不需要.*，因为它会一直匹配到字符串的结尾。

对于示例数据，您可以match：

^https?:\/\/[^#?\/]+

Regex demo

strings = [
"https://apple.com?query=true&slash=false",
    "https://apple.com#anchor=true&slash=false",
    "http://www.apple.com/#anchor=true&slash=true&whatever=foo",
    "https://foo.bar/?q=true"
];

strings.forEach(s => {
    console.log(s.match(/^https?:\/\/[^#?\/]+/)[0]);
})

Answer 2

这将使您一切都达到.com部分。拔出网址的第一部分后，您将必须附加.com。

^http.*?(?=\.com)

或者您可以这样做：

myUrl.Replace(/(#|\?|\/#).*$/, "")

要删除主机名之后的所有内容。

Answer 3

您可以为此使用JavaScript内置的URL。 URL还可以为您提供其他易于访问的已解析属性，例如查询字符串参数，协议等。

正则表达式是一种使JavaScript变得非常简单的痛苦方式。

我知道您问过有关使用正则表达式的问题，但是如果您（或将来要来这里的人）真的只是在乎获取信息并且不打算使用正则表达式，那么这个答案可能会有所帮助。

let one = "https://apple.com?query=true&slash=false"
let two = "https://apple.com#anchor=true&slash=false"
let three = "http://www.apple.com/#anchor=true&slash=true&whatever=foo"

let urlOne = new URL(one)
console.log(urlOne.origin)

let urlTwo = new URL(two)
console.log(urlTwo.origin)

let urlThree = new URL(three)
console.log(urlThree.origin)

Answer 4

    const baseUrl = url.replace(/(.*:\/\/.*)[\?\/#].*/, '$1');

Answer 5

您可以执行以下操作

if(url.indexOf('#') !== -1) { var baseUrl = url.split("#")[0]; } else if (url.indexOf('?') !== -1) { var baseUrl = url.split("?")[0]; } else { var baseUrl =  url }

使用Regex和Javascript从字符串获取基本URL

5 个答案: