Question

在Perl 5.20中，我需要在子字符串上运行正则表达式而不将其复制到新字符串中。所以有些等同于C语言中的$str[$to]。

原因是它是在循环中执行的，如果它每次都复制字符串，那么生成的代码是O(n^2)，而没有复制的代码只是O(n)。因此，如果输入2MB字符串，则无法使用。

或者，我将欢迎有关如何重写代码的建议：我需要使用查找表进行搜索和替换。

示例输入

$str: "abcde"
$tbl: {ab=>"xy", bc=>"rq", e=>"a"}
$reg: qr/ab|bc|e/
expected output: xycda

这是我当前对短字符串运行良好的代码，但对于大字符串不能完成：

#translate $str: if $reg match found, replace the match with a value in $tbl hash that corresponds to the match
sub internalEncode {
    my ($str, $tbl, $reg) = @_;
    my $res="";
    my $prevTo = 0;
    my $to = 0;
    #the substr($str,$to) makes it slow; in C with 0 terminated strings
    #I would need to write here something like: $str[$to]
    while (substr($str,$to) =~ $reg) {
        my $match = $&;
        my $from = $prevTo + $-[0];
        $to = $prevTo + $+[0];
        $res .= substr($str,$prevTo,$from - $prevTo);
        $res .= $tbl->{quotemeta $match};
        $prevTo = $to;
    }
    $res .= substr($str,$prevTo);
    return $res;
}

Answer 1

Perl不是C.而且，使用正则表达式是另一个步骤。所以，一旦你看到它，我将打破实现可能看似微不足道的东西所需的步骤，但如果你还处于C心态，可能很难想出来。

你的例子是：

$str: "abcde"
$tbl: {ab=>"xy", bc=>"rq", e=>"a"}
$reg: qr/ab|bc|e/
expected output: xycda

让我们考虑如何针对这种特殊情况实现这一点。

use strict;
use warnings;

my $str = 'abcde';
my $tbl = { ab => "xy", bc => "rq", e => "a"};
my $pat = qr{ ( ab | bc | e ) }x;

$str =~ s/$pat/$tbl->{$1}/g;

print "$str\n";

也就是说，我们想要匹配三个可能的子串。找到匹配项后，我们要捕获它，并将其替换为查找表中的相应字符串。

我单独构建了这个模式，因为当我可以从查找表的键中实际导出它时，我想避免手动输入它：

use strict;
use warnings;

my $str = 'abcde';
my $tbl = { ab => "xy", bc => "rq", e => "a"};
my $pat = join '|', map quotemeta($_), sort keys %$tbl;

$str =~ s/($pat)/$tbl->{$1}/g;

print "$str\n";

Answer 2

my $str = "abcde";
my $tbl = {ab=>"xy", bc=>"rq", e=>"a"};
my $re = join '|', map quotemeta, keys %$tbl;
$re = qr/($re)/;

$str =~ s/$re/$tbl->{$1}/g;
say $str;  # xycda

如果您不想更改原件：

my $res = $str =~ s/$re/$tbl->{$1}/rg;     # 5.14+

或

( my $res = $str ) =~ s/$re/$tbl->{$1}/g;

Perl：创建一个不复制的字符串

2 个答案: