Question

下午好，

我正在学习如何在Ruby中使用RegEx，并且已经达到了我需要一些帮助的地步。我试图从字符串中提取0到多个URL。

这是我正在使用的代码：

sStrings = ["hello world: http://www.google.com", "There is only one url in this string http://yahoo.com . Did you get that?", "The first URL in this string is http://www.bing.com and the second is http://digg.com","This one is more complicated http://is.gd/12345 http://is.gd/4567?q=1", "This string contains no urls"]
sStrings.each  do |s|
  x = s.scan(/((http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.[\w-]*)?)/ix)
  x.each do |url|
    puts url
  end
end

这是返回的内容：

http://www.google.com
http
.google
nil
nil
http://yahoo.com
http
nil
nil
nil
http://www.bing.com
http
.bing
nil
nil
http://digg.com
http
nil
nil
nil
http://is.gd/12345
http
nil
/12345
nil
http://is.gd/4567
http
nil
/4567
nil

仅提取完整网址的最佳方法是什么？而不是提取RegEx的部分？

Answer 1

您可以使用匿名捕获组（？：...）而不是（...）。

我看到你这样做是为了学习Regex，但是如果你真的想从String中提取URL，请看一下URI.extract，它从String中提取URI。（require "uri"以便使用它）

Answer 2

您可以使用(?:SUB_PATTERN)创建非捕获组。这是一个插图，还有一些额外的简化。此外，由于您使用的是/x选项，因此可以通过以可读的方式布置正则表达式来利用它。

sStrings = [
    "hello world: http://www.google.com",
    "There is only one url in this string http://yahoo.com . Did you get that?",
    "... is http://www.bing.com and the second is http://digg.com",
    "This one is more complicated http://is.gd/12345 http://is.gd/4567?q=1",
    "This string contains no urls",
]

sStrings.each  do |s|
    x = s.scan(/
        https?:\/\/
        \w+
        (?: [.-]\w+ )*
        (?:
            \/
            [0-9]{1,5}
            \?
            [\w=]*
        )?
    /ix)

    p x
end

这对于学习很好，但是并不是真的尝试以这种方式匹配URL。有工具可供选择。

在Ruby中提取URL（到数组）

2 个答案: