Question

我正在尝试做一些基本的网址清理，以便

www.google.com
www.google.com/
http://google.com
http://google.com/
https://google.com
https://google.com/

如果http://www.google.com位于开头，则{p>替换为https://www.google.com（或https://。

基本上我想检查开头是http/https还是一个正则表达式结尾/。

我正在尝试这样的事情：

"https://google.com".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/)在这种情况下，我得到： => #<MatchData "https://google.com" 1:"https://" 2:"google.com" 3:nil> 这很好。

不幸的是：

"https://google.com/".match(/^(http:\/\/|https:\/\/)(.*)(\/)*$/)我得到： => #<MatchData "https://google.com/" 1:"https://" 2:"google.com/" 3:nil>并希望2:"google.com" 3:"/"

知道怎么做吗？

Answer 1

如果你发现错误，很明显;）

你在尝试：

^(http:\/\/|https:\/\/)(.*)(\/)*$

答案是使用：

^(http:\/\/|https:\/\/)(.*?)(\/)*$

这使得操作员“非贪婪”，因此拖尾正斜杠不会被“。”吞噬。操作

编辑：

事实上，你应该真正使用：

^(http:\/\/|https:\/\/)?(www\.)?(.*?)(\/)*$

这样，您还将匹配前两个示例，其中没有“http（s）：//”。你也在分割“www”部分的价值/存在。在行动中：http://www.rubular.com/r/VUoIUqCzzX

EDIT2：

我很无聊，想要完善它：P

你走了：

^(https?:\/\/)?(?:www\.)?(.*?)\/?$

现在，您需要做的就是用第一场比赛替换您的网站（或“http：//”，如果是nil），然后是“www。”，然后是第二场比赛。

行动中：http://www.rubular.com/r/YLeO5cXcck

（18个月后）编辑：

查看我真棒的红宝石宝石，它将有助于解决您的问题！

https://github.com/tom-lord/regexp-examples

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples # => 
  ["google.com",
   "google.com/",
   "www.google.com",
   "www.google.com/",
   "http://google.com",
   "http://google.com/",
   "http://www.google.com",
   "http://www.google.com/",
   "https://google.com",
   "https://google.com/",
   "https://www.google.com",
   "https://www.google.com/"]

/(https?:\/\/)?(?:www\.)?google\.com\/?/.examples.map(&:subgroups) # =>
  [[],
   [],
   [],
   [],
   ["http://"],
   ["http://"],
   ["http://"],
   ["http://"],
   ["https://"],
   ["https://"],
   ["https://"],
   ["https://"]]

简单的URL清理

1 个答案: