Question

使用正则表达式，如何删除网址中第一条路径/之前的所有内容？

示例网址：https://www.example.com/some/page?user=1&email=joe@schmoe.org

由此，我只想要/some/page?user=1&email=joe@schmoe.org

如果它只是根域（即https://www.example.com/），那么我只想要返回/。

域可能有也可能没有子域，它可能有也可能没有安全协议。真的最终只想在第一个路径斜杠之前删除任何。

如果重要，我正在运行Ruby 1.9.3。

Answer 1

不要使用正则表达式。使用URI课程。你可以写：

require 'uri'

u = URI.parse('https://www.example.com/some/page?user=1&email=joe@schmoe.org')
u.path #=> "/some/page"
u.query #=> "user=1&email=joe@schmoe.org"

# All together - this will only return path if query is empty (no ?)
u.request_uri #=> "/some/page?user=1&email=joe@schmoe.org"

Answer 2

 require 'uri'

 uri = URI.parse("https://www.example.com/some/page?user=1&email=joe@schmoe.org")

 > uri.path + '?' + uri.query
  => "/some/page?user=1&email=joe@schmoe.org"

正如Gavin所说，使用RegExp并不是一个好主意，尽管这很诱人。您可以拥有包含特殊字符的URL，甚至包含UniCode字符，这是您在编写RegExp时没有想到的。这尤其可以在您的查询字符串中发生。使用URI库是一种更安全的方法。

Answer 3

使用String#index

可以做同样的事情

index（substring [，offset]）

str = "https://www.example.com/some/page?user=1&email=joe@schmoe.org"
offset = str.index("//") # => 6
str[str.index('/',offset + 2)..-1]
# => "/some/page?user=1&email=joe@schmoe.org"

Answer 4

我强烈同意在这种情况下使用URI模块的建议，我不认为自己对正则表达式有好处。不过，似乎有必要展示一种可能的方式来做你所要求的。

test_url1 = 'https://www.example.com/some/page?user=1&email=joe@schmoe.org'
test_url2 = 'http://test.com/'
test_url3 = 'http://test.com'

regex = /^https?:\/\/[^\/]+(.*)/

regex.match(test_url1)[1]
# => "/some/page?user=1&email=joe@schmoe.org"

regex.match(test_url2)[1]
# => "/"

regex.match(test_url3)[1]
# => ""

请注意，在最后一种情况下，URL没有尾随'/'，因此结果为空字符串。

正则表达式（/^https?:\/\/[^\/]+(.*)/）表示字符串以（^）http（http）开头，后跟s（{{1}然后是s?（://），后跟至少一个非斜杠字符（:\/\/），后跟零个或多个字符，我们想捕获这些字符（ [^\/]+）。

我希望您发现该示例和解释具有教育意义，我再次建议您不要在这种情况下使用正则表达式。 URI模块使用起来更简单，更强大。

删除URL中第一个斜杠之前的所有内容？

4 个答案: