匹配重叠的正则表达式perl

时间:2015-01-18 15:49:34

标签: regex overlapping

我正在尝试编写一个匹配重叠的正则表达式的脚本。我从之前的问题中得到了一些指示,但似乎无法克服我的问题。我试图找到任何以'aaa'开头的正则表达式,然后包含3个字符的任意倍数,并以'ccc'结尾。我的脚本是这样的:

   #!/usr/bin/perl
   $string = aaatttcccxaaattcccxaaattttttcccxaaattttttccc;
   while ($string =~ /aaa(...)+?ccc/ig) {
       my $string_name = $&;
       my $len_string = length $&;
       my $position = pos $string;
       my $start_position = ($position - $len_string) + 1;
       my $end_position = pos $string;
       print "String \'$string_name\' of length $len_string was found at position $start_position through $end_position.\n\n";
       print "\n";
    }

我的输出如下:

      String 'aaatttccc' of length 9 was found at position 1 through 9.

      String 'aaattcccxaaattttttccc' of length 21 was found at position 11 through 31.

      String 'aaattttttccc' of length 12 was found at position 33 through 44.

找不到第二个输出中的字符串'aaattttttccc'(从位置19-31开始),应该是第三个输出。

如何让它查找重叠的正则表达式?

由于

1 个答案:

答案 0 :(得分:0)

问题是匹配aaattcccxaaattttttccc时正则表达式,正则表达式匹配位置已经在aaattcccxaaattttttccc的末尾,并且无法再返回到匹配aaattttttccc

您可以使用lookhead regex:

(aaa)(?=((?:...)+?ccc))

RegEx Demo

抓住所捕获的小组#1,并在每场比赛中抓住小组#2。


代码:

#!/usr/bin/perl
use strict; use warnings;

my $str = 'aaatttcccxaaattcccxaaattttttcccxaaattttttccc';

while ( $str =~ /(aaa)(?=((?:...)+?ccc))/g ) {
   print "$1$2\n";
}