我正在尝试编写一个匹配重叠的正则表达式的脚本。我从之前的问题中得到了一些指示,但似乎无法克服我的问题。我试图找到任何以'aaa'开头的正则表达式,然后包含3个字符的任意倍数,并以'ccc'结尾。我的脚本是这样的:
#!/usr/bin/perl
$string = aaatttcccxaaattcccxaaattttttcccxaaattttttccc;
while ($string =~ /aaa(...)+?ccc/ig) {
my $string_name = $&;
my $len_string = length $&;
my $position = pos $string;
my $start_position = ($position - $len_string) + 1;
my $end_position = pos $string;
print "String \'$string_name\' of length $len_string was found at position $start_position through $end_position.\n\n";
print "\n";
}
我的输出如下:
String 'aaatttccc' of length 9 was found at position 1 through 9.
String 'aaattcccxaaattttttccc' of length 21 was found at position 11 through 31.
String 'aaattttttccc' of length 12 was found at position 33 through 44.
找不到第二个输出中的字符串'aaattttttccc'(从位置19-31开始),应该是第三个输出。
如何让它查找重叠的正则表达式?
由于
答案 0 :(得分:0)
问题是匹配aaattcccxaaattttttccc
时正则表达式,正则表达式匹配位置已经在aaattcccxaaattttttccc
的末尾,并且无法再返回到匹配aaattttttccc
。
您可以使用lookhead regex:
(aaa)(?=((?:...)+?ccc))
抓住所捕获的小组#1,并在每场比赛中抓住小组#2。
#!/usr/bin/perl
use strict; use warnings;
my $str = 'aaatttcccxaaattcccxaaattttttcccxaaattttttccc';
while ( $str =~ /(aaa)(?=((?:...)+?ccc))/g ) {
print "$1$2\n";
}