Question

在Perl中搜索和替换大变量需要很长时间。

例如。

$original = 'aaaabc';
$replace = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' x 1000
$original =~ s/b/$replace/;

一旦$replace足够大，正则表达式可能需要很长时间。我假设某些缓冲区被破坏并不断扩展。

是否有任何提高绩效的建议？

Answer 1

大到多大？替换发生在我的Windows框中的一秒钟内，即使字符串长度 ~~30,000,000~~ 30,000,000,000,000,000,000：

> perl -Mstrict -wE "my $start = time;my $str = 'aaaabc'; my $replace = 'b' x 30_000_000_000_000_000_000; $str =~ s/b/$replace/; printf qq<%d s\n>, time - $start;"
0 s

Answer 2

不确定为什么会看到性能下降。我创建了一个替换50000多个字符的字符串，然后按照编写的方式运行程序。

$ time(perl large.pl )

real    0m0.010s
user    0m0.002s
sys     0m0.004s
$

但是，我确实有一个建议。如果替换字符串是相同字符的有限长度，为什么不在原始字符串中找到特定字符，将字符串拆分到该字符上，然后将部分连接到替换字符的正面和背面，并打印出来？< / p>

Answer 3

Benchmark gives 0 wallclock secs with your input 

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark;

my $original = 'aaaabcd';
my $replace = 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb' x 1000;
my $start_time = new Benchmark;
$original =~ s/b/$replace/;
my $end_time = new Benchmark;
my $diff = timediff($end_time,$start_time);
print "Regex took:",timestr($diff);

输出

Regex took 0 wallclock secs (0.00 usr + 0.00 sys = 0.00 CPU)

如何通过非常大的搜索和替换变量来提高Perl正则表达式的性能？

3 个答案: