Question

我想修补从网页中提取的一些文本数据。样品：

t="First sentence. Second sentence.Third sentence."

在第二句末尾的点之后没有空格。这标志着第3个句子在原始文档中的单独行（在br标记之后）。

我想使用此正则表达式将“\ n”字符插入正确的位置并修补我的文本。我的正则表达式：

t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)

但遗憾的是它不起作用：“NoMethodError：未定义的方法`+'代表nil：NilClass” 如何正确地反向引用匹配的组？在Microsoft Word中这很容易，我只需要使用\ 1和\ 2符号。

Answer 1

您可以使用\1对替换字符串进行反向引用（以匹配捕获组1）。

t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2") # => "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."

Answer 2

如果您使用的是gsub(regex, replacement)，请使用'\1'，'\2'，...来引用匹配项。确保不要在replacement周围加上双引号，否则就像在Joshua的回答中那样逃避反斜杠。从'\1'到匹配的转换将在gsub内完成，而不是通过字面解释。
如果您使用的是gsub(regex){replacement}，请使用$1，$1，...

但是对于你的情况，不使用匹配更容易：

t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n")

Answer 3

如果你来到这里是因为Rubocop抱怨“避免使用Perl风格的背板。”约1美元，2美元等......你可以这样做：

some_id = $1
# or
some_id = Regexp.last_match[1] if Regexp.last_match

some_id = $5
# or
some_id = Regexp.last_match[5] if Regexp.last_match

它还希望你做

%r{//}.match(some_string)

而不是

some_string[//]

拉姆（Rubocop）

当我使用分组时，如何使用gsub对Ruby正则表达式（regex）进行反向引用？

3 个答案: