我有这样的文字:
00:00 stuff 00:01 more stuff multi line and going 00:02 still have
所以,我没有一个块结束,只是一个新的块启动。
我想以递归方式获取所有块:
1 = 00:00 stuff 2 = 00:01 more stuff multi line and going
等
波纹管代码只给我这个:
$VAR1 = '00:00';
$VAR2 = '';
$VAR3 = '00:01';
$VAR4 = '';
$VAR5 = '00:02';
$VAR6 = '';
我做错了什么?
my $text = '00:00 stuff
00:01 more stuff
multi line
and going
00:02 still
have
';
my @array = $text =~ m/^([0-9]{2}:[0-9]{2})(.*?)/gms;
print Dumper(@array);
答案 0 :(得分:4)
版本5.10.0引入了named capture groups,它对于匹配非平凡模式非常有用。
<强>
(?'NAME'pattern)
强>
的(?<NAME>pattern)
强>命名捕获组。在每个方面都与正常捕获括号
()
相同,但是另外一个事实是该组可以在各种正则表达式构造(例如\g{NAME}
)中通过名称引用,并且可以在成功后通过名称访问通过%+
或%-
进行匹配。有关perlvar和%+
哈希值的详细信息,请参阅%-
。如果多个不同的捕获组具有相同的名称,则
$+{NAME}
将引用匹配中最左侧定义的组。表单
(?'NAME'pattern)
和(?<NAME>pattern)
是等效的。
命名捕获组允许我们在正则表达式中命名子模式,如下所示。
use 5.10.0; # named capture buffers
my $block_pattern = qr/
(?<time>(?&_time)) (?&_sp) (?<desc>(?&_desc))
(?(DEFINE)
# timestamp at logical beginning-of-line
(?<_time> (?m:^) [0-9][0-9]:[0-9][0-9])
# runs of spaces or tabs
(?<_sp> [ \t]+)
# description is everything through the end of the record
(?<_desc>
# s switch makes . match newline too
(?s: .+?)
# terminate before optional whitespace (which we remove) followed
# by either end-of-string or the start of another block
(?= (?&_sp)? (?: $ | (?&_time)))
)
)
/x;
在
中使用它my $text = '00:00 stuff
00:01 more stuff
multi line
and going
00:02 still
have
';
while ($text =~ /$block_pattern/g) {
print "time=[$+{time}]\n",
"desc=[[[\n",
$+{desc},
"]]]\n\n";
}
输出:
$ ./blocks-demo time=[00:00] desc=[[[ stuff ]]] time=[00:01] desc=[[[ more stuff multi line and going ]]] time=[00:02] desc=[[[ still have ]]]
答案 1 :(得分:3)
这应该可以解决问题。下一个 \ d \ d:\ d \ d 的开头被视为块结束。
use strict;
my $Str = '00:00 stuff
00:01 more stuff
multi line
and going
00:02 still
have
00:03 still
have' ;
my @Blocks = ($Str =~ m#(\d\d:\d\d.+?(?:(?=\d\d:\d\d)|$))#gs);
print join "--\n", @Blocks;
答案 2 :(得分:0)
你的问题是.*?
非贪婪,就像.*
贪婪一样。当它没有被强制时,它尽可能少地匹配,在这种情况下是空字符串。
所以,在非贪婪的比赛之后你需要一些东西来锚定你的捕获。我想出了这个正则表达式:
my @array = $text =~ m/\n?([0-9]{2}:[0-9]{2}.*?)(?=\n[0-9]{2}:|$)/gs;
如您所见,我删除了/m
选项,以便能够准确地匹配前瞻断言中的字符串结尾。
您也可以考虑这个解决方案:
my @array = split /(?=[0-9]{2}:[0-9]{2})/, $text;