Question

有没有办法在perl中找到正则表达式匹配的所有可能的起始位置？

例如，如果你的正则表达式是“aa”而文本是“aaaa”，它将返回0,1和2，而不是0和2。

显然，你可以做一些事情，比如返回第一个匹配，然后删除所有字符，包括起始字符，并执行另一个搜索，但我希望有更高效的东西。

Answer 1

使用前瞻：

$ perl -le 'print $-[0] while "aaaa" =~ /a(?=a)/g'

通常，将除正则表达式的第一个字符之外的所有内容放在(?=...)内。

Answer 2

<强>更新

我更多地考虑了这个问题，并使用嵌入式代码块提出了这个解决方案，这比grep解决方案快了近三倍：

use 5.010;
use warnings;
use strict;

{my @pos;
 my $push_pos = qr/(?{push @pos, $-[0]})/;

sub with_code {
    my ($re, $str) = @_;
    @pos = ();
    $str =~ /(?:$re)$push_pos(?!)/;
    @pos
}}

并进行比较：

sub with_grep {  # old solution
    my ($re, $str) = @_;
    grep {pos($str) = $_; $str =~ /\G(?:$re)/} 0 .. length($str) - 1;
}

sub with_while { # per Michael Carman's solution, corrected
    my ($re, $str) = @_;
    my @pos;
    while ($str =~ /\G.*?($re)/) {
        push @pos, $-[1];
        pos $str = $-[1] + 1
    }
    @pos
}

sub with_look_ahead {  # a fragile "generic" version of Sean's solution
    my ($re, $str) = @_;
    my ($re_a, $re_b) = split //, $re, 2;
    my @pos;
    push @pos, $-[0] while $str =~ /$re_a(?=$re_b)/g;
    @pos
}

基准和理智检查：

use Benchmark 'cmpthese';

my @arg = qw(aa aaaabbbbbbbaaabbbbbaaa);
my $expect = 7;

for my $sub qw(grep while code look_ahead) {
    no strict 'refs';
    my @got = &{"with_$sub"}(@arg);
    "@got" eq '0 1 2 11 12 19 20' or die "$sub: @got";
}

cmpthese -2 => {
    grep  => sub {with_grep      (@arg) == $expect or die},
    while => sub {with_while     (@arg) == $expect or die},
    code  => sub {with_code      (@arg) == $expect or die},
    ahead => sub {with_look_ahead(@arg) == $expect or die},
};

打印哪些：

          Rate  grep while ahead  code
grep   49337/s    --  -20%  -43%  -65%
while  61293/s   24%    --  -29%  -56%
ahead  86340/s   75%   41%    --  -38%
code  139161/s  182%  127%   61%    --

Answer 3

我知道你要求一个正则表达式，但实际上有一个简单的内置函数可以完成类似的函数index（perldoc -f index）。由此我们可以为您的直接问题构建一个简单的解决方案，但如果您真的需要比您的示例更复杂的搜索，这将无法工作，因为它只查找子字符串（在第三个参数给出的索引之后）。

#!/usr/bin/env perl

use strict;
use warnings;

my $str = 'aaaa';
my $substr = 'aa';

my $pos = -1;
while (1) {
  $pos = index($str, $substr, $pos + 1);
  last if $pos < 0;
  print $pos . "\n";
}

Answer 4

您可以使用pos()函数的全局匹配：

my $s1 = "aaaa";
my $s2 = "aa";

while ($s1 =~ /aa/g) {
    print pos($s1) - length($s2), "\n";
}

在perl中查找正则表达式匹配的所有可能的起始位置，包括重叠匹配？

4 个答案: