Question

我正在使用ruby，我正在尝试找到一种方法来抓取{start_grab_entries}和{end_grab_entries}之间的文本，如下所示：

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

像这样：

$1 => "i want to grab
       the text that
       you see here in
       the middle"

到目前为止，我试过这个作为我的正则表达式：

\{start_grab_entries}(.|\n)*\{end_grab_entries}

然而，使用$ 1，这给了我一个空白。你知道我能做些什么来正确地抓住标签之间的文本块吗？

Answer 1

有一种更好的方法可以让点匹配换行符（/m修饰符）：

regexp = /\{start_grab_entries\}(.*?)\{end_grab_entries\}/m

此外，通过附加*使?懒惰，或者如果您的输入中出现多个此类部分，则可能会匹配太多。

那就是说，你得到一个空白匹配的原因是你重复了捕获组本身;因此，您只捕获了最后一次重复（在这种情况下，\n）。

如果您将重复的捕获组放在之外，它会“有效”：

\{start_grab_entries\}((?:.|\n)*)\{end_grab_entries\}`

但是，如上所述，还有更好的方法。

Answer 2

string=<<EOF
blah
{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}
blah
EOF

puts string.scan(/{start_grab_entries}(.*?){end_grab_entries}/m)

Answer 3

我正在添加这个，因为我们经常从文件或数据流中读取数据，其中我们想要的行数不会同时存在于内存中。如果数据可能超过可用内存，则不鼓励“啜饮”文件，这在生产企业环境中很容易发生。这就是我们在扫描文件时抓住某些边界标记之间的线条的方法。它不依赖于正则表达式，而是使用Ruby的“触发器”..运算符：

#!/usr/bin/ruby

lines = []
DATA.each_line do |line|
  lines << line if (line['{start_grab_entries}'] .. line['{end_grab_entries}'])
end

puts lines          # << lines with boundary markers
puts
puts lines[1 .. -2] # << lines without boundary markers

__END__
this is not captured

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

this is not captured either

此代码的输出如下：

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

i want to grab
the text that
you see here in
the middle

正则表达式：如何使用正则表达式获取文本块？（在红宝石中）

3 个答案:

正则表达式：如何使用正则表达式获取文本块？ （在红宝石中）

3 个答案:

正则表达式：如何使用正则表达式获取文本块？（在红宝石中）