Question

我正在使用带有正则表达式的gsub方法：

@text.gsub(/(-\n)(\S+)\s/) { "#{$2}\n" }

输入数据示例：

"The wolverine is now es-
sentially absent from 
the southern end
of its European range."

应该返回：

"The wolverine is now essentially
absent from  
the southern end
of its European range."

该方法运行正常，但rubocop报告和攻击：

避免使用Perl风格的backrefs。

如何使用MatchData对象而不是$2重写它？

Answer 1

您可以使用反斜杠而不使用块：

@text.gsub /(-\n)(\S+)\s/, "\\2\n"

另外，只使用一个组会更清洁一点，因为上面的第一个组件是不需要的：

@text.gsub /-\n(\S+)\s/, "\\1\n"

Answer 2

如果您想使用Regexp.last_match：

@text.gsub(/(-\n)(\S+)\s/) { Regexp.last_match[2] + "\n" }

或：

@text.gsub(/-\n(\S+)\s/) { Regexp.last_match[1] + "\n" }

请注意，当涉及逻辑时，应使用gsub中的块。如果没有逻辑，设置为"\\1\n"或'\1' + "\n"的第二个参数就可以了。

Answer 3

此解决方案在换行符和结束句子或字符串的拆分词之前计算错误空格。它使用String#gsub一个块而没有捕获组。

<强>代码

R = /
    [[:alpha:]]\- # match a letter followed by a hyphen
    \s*\n         # match a newline possibly preceded by whitespace
    [[:alpha:]]+  # match one or more letters
    [.?!]?        # possibly match a sentence terminator
    \n?           # possibly match a newline 
    \s*           # match zero or more whitespaces
    /x            # free-spacing regex definition mode

def remove_hyphens(str)
  str.gsub(R) { |s| s.gsub(/[\n\s-]/, '') << "\n" }
end

<强>实施例

str =<<_       
The wolverine is now es-
sentially absent from
the south-
ern end of its
European range.
_

puts remove_hyphens(str)
The wolverine is now essentially
absent from
the southern
end of its
European range.

puts remove_hyphens("now es-  \nsentially\nabsent")
now essentially
absent

puts remove_hyphens("now es-\nsentially.\nabsent")
now essentially.
absent

remove_hyphens("now es-\nsentially?\n")
  #=> "now essentially?\n" (no extra \n at end)

如何用MatchData对象替换Perl样式的正则表达式

3 个答案: