Question

我有这样的文字：

00:00 stuff
00:01 more stuff
multi line
  and going
00:02 still 
    have

所以，我没有一个块结束，只是一个新的块启动。

我想以递归方式获取所有块：

1 = 00:00 stuff
2 = 00:01 more stuff
multi line
  and going

等

波纹管代码只给我这个：

$VAR1 = '00:00';
$VAR2 = '';
$VAR3 = '00:01';
$VAR4 = '';
$VAR5 = '00:02';
$VAR6 = '';

我做错了什么？

my $text = '00:00 stuff
00:01 more stuff
multi line
 and going
00:02 still 
have
    ';
my @array = $text =~ m/^([0-9]{2}:[0-9]{2})(.*?)/gms;
print Dumper(@array);

Answer 1

版本5.10.0引入了named capture groups，它对于匹配非平凡模式非常有用。

<强> (?'NAME'pattern)
  的 (?<NAME>pattern)

命名捕获组。在每个方面都与正常捕获括号()相同，但是另外一个事实是该组可以在各种正则表达式构造（例如\g{NAME}）中通过名称引用，并且可以在成功后通过名称访问通过%+或%-进行匹配。有关perlvar和%+哈希值的详细信息，请参阅%-。

如果多个不同的捕获组具有相同的名称，则$+{NAME}将引用匹配中最左侧定义的组。

表单(?'NAME'pattern)和(?<NAME>pattern)是等效的。

命名捕获组允许我们在正则表达式中命名子模式，如下所示。

use 5.10.0; # named capture buffers my $block_pattern = qr/ (?<time>(?&_time)) (?&_sp) (?<desc>(?&_desc)) (?(DEFINE) # timestamp at logical beginning-of-line (?<_time> (?m:^) [0-9][0-9]:[0-9][0-9]) # runs of spaces or tabs (?<_sp> [ \t]+) # description is everything through the end of the record (?<_desc> # s switch makes . match newline too (?s: .+?) # terminate before optional whitespace (which we remove) followed # by either end-of-string or the start of another block (?= (?&_sp)? (?: $ | (?&_time))) ) ) /x;

在
中使用它
my $text = '00:00 stuff 00:01 more stuff multi line and going 00:02 still have '; while ($text =~ /$block_pattern/g) { print "time=[$+{time}]\n", "desc=[[[\n", $+{desc}, "]]]\n\n"; }

输出：

$ ./blocks-demo time=[00:00] desc=[[[ stuff ]]] time=[00:01] desc=[[[ more stuff multi line and going ]]] time=[00:02] desc=[[[ still have ]]]

Answer 2

这应该可以解决问题。下一个 \ d \ d：\ d \ d 的开头被视为块结束。

use strict;

my $Str = '00:00 stuff
00:01 more stuff
multi line
  and going
00:02 still 
    have
00:03 still 
    have' ;

my @Blocks = ($Str =~ m#(\d\d:\d\d.+?(?:(?=\d\d:\d\d)|$))#gs);

print join "--\n", @Blocks;

Answer 3

你的问题是.*?非贪婪，就像.*贪婪一样。当它没有被强制时，它尽可能少地匹配，在这种情况下是空字符串。

所以，在非贪婪的比赛之后你需要一些东西来锚定你的捕获。我想出了这个正则表达式：

my @array = $text =~ m/\n?([0-9]{2}:[0-9]{2}.*?)(?=\n[0-9]{2}:|$)/gs;

如您所见，我删除了/m选项，以便能够准确地匹配前瞻断言中的字符串结尾。

您也可以考虑这个解决方案：

my @array = split /(?=[0-9]{2}:[0-9]{2})/, $text;

perl regex用于提取多行块

3 个答案: