Question

我的Rails应用程序中有一个长度为141个字符的正则表达式，而Rubocop不喜欢它。

我的正则表达式：

URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/

此模式检查网址＆amp;一级路径，例如http(s)://example.com/path

你能安全地在Ruby中拆分正则表达式吗？在Ruby中拆分正则表达式的一般机制是什么？
你怎么告诉Rubocop在正则表达式上放松一下？

非常感谢！

Answer 1

你应该尝试这样的事情：

regexp = %r{\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+
            ([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[\w.]+)\z}x

if 'http://example.com/path' =~ regexp
  puts 'matches'
end

最后的“x”是忽略模式中的空格和注释。

检查最后一个例子https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#regular-expressions

的红宝石样式指南

Answer 2

是。你可以创建一部分正则表达式，并在你想要的最终正则表达式中使用它们。

prefix = %w(http://www. https://www. https://)
prefix = Regexp.union(*prefix.map{|e| Regexp.escape(e)})
letters = "[a-z\d]+"
URL_REGEX = /\A(#{prefix})?#{letters}([-.]#{letters)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-.\w]+)\z/

Answer 3

你如何告诉Rubocop在正则表达式上放松一下？

抱怨此事的警察很可能Metrics/LineLength。没有配置选项可以忽略正则表达式，但如果你正好使用正则表达式那么你可以内联禁用它：

# rubocop:disable Metrics/LineLength
URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
# rubocop:enable Metrics/LineLength

也可以在行尾添加一个尾随rubocop:disable，但由于该行已经很长，很容易被遗漏，因此启用 - 禁用组合可能会更好。

Answer 4

另一种选择是使用更简洁的正则表达式。当您不需要时，有几个地方可以重复模式。

/\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   (https?:\/\/(www.)?)?

随着这一点以及更多的改动，我把你的正则表达式归结为：

/^(https?:\/\/(www.)?)?[-a-z0-9.]+\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)$/

它不完全等效，但是here's my test。

Answer 5

这详细说明了@Gacha的答案。是的，您需要自由间隔模式（/x）。正则表达式解析器在构造正则表达式之前删除所有空格。因此，您必须保护正则表达式中的所有空格字符。可以通过将每个字符放入字符类（[ ]）或编写\p{Space}，[[:space:]]或\s来完成。除第一个字符外，所有字符都与任何空白字符（空格，制表符，换行符和其他一些空白字符）匹配，这些字符可能需要也可能不需要。

使用自由间隔模式的另一个好处是，您可以使正则表达式具有自记录功能。

在这里您可以编写以下内容：

URL_REGEX = 
  /
  \A
  (               # open cap group 1
    https?:\/\/   # match 'http:\/\/' or 'https:\/\/'
    (?:www\.)?    # optionally match 'www.' in non-cap group
  )?              # close cap group 1 and optionally match it
  [a-z0-9]+       # match >= 1 lowercase letters or digits
  (               # open cap group 2
    [-.]          # match '-' or '.' ('{1}' not needed and no
                  # need to escape '-' or '.' in a char class)
    [a-z0-9]+     # match >= 1 lowercase letters or digits 
  )*              # close cap group 2 and match it >= 0 times
  \.              # match a period
  [a-z]{2,5}      # match 2-5 lowercase letters
  (:[0-9]{1,5})?  # optionally match ':' followed by 1-5 
                  # digits in cap group 3
  (               # open cap group 4
    \/            # match '\/'
    [-\w.]+       # match '-', word char or '.' 1 >= 1 times
  )               # close cap group 4
  \z              # match end of string
  /x              # free spacing regex definition mode

您会看到，我做了一些更改以简化您的正则表达式。请注意，#右边的正斜杠必须被转义

Ruby：如何在多行上分割正则表达式？

5 个答案: