我想以快速的方式在循环中在perl中追加一个字符串,而不必为每次迭代复制字符串。我正在寻找像Java或C#的StringBuilder。
我目前知道以下备选方案,以便做'a + = b'。
我对将所有字符串复制到另一个字符串不感兴趣。我需要每次复制一个字符,或者在迭代时附加小字符串。我试图解决以下问题:将输入字符串'aaabbccc'压缩为'3a2b3c'。所以想法是迭代输入字符串,检查我们有多少重复字符,然后以压缩方式附加到输出。在Perl中执行此操作最有效的是什么?
Here is a link我试图解决的问题。我虽然略有不同。
答案 0 :(得分:4)
为了比较,我尝试测试不同版本以解决压缩字符串的实际问题。这是我的测试脚本test.pl
:
use strict;
use warnings;
use Benchmark qw(cmpthese);
use Inline C => './compress_c.c';
my $str_len = 10000;
my @chars = qw(a b c d);
my $str;
$str .= [@chars]->[rand 4] for 1 .. $str_len;
cmpthese(
-1,
{
compress_array => sub { compress_array( $str ) },
compress_regex => sub { compress_regex( $str ) },
compress_str => sub { compress_str( $str ) },
compress_c => sub { compress_c( $str ) },
}
);
# Suggested by @melpomene in the comments
sub compress_regex {
return $_[0] =~ s/([a-z])\1+/($+[0] - $-[0]) . $1/egr;
}
sub compress_array {
my $result = '';
my @chrs = split //, $_[0];
my $prev = $chrs[0];
my $count = 1;
my @result;
for my $i ( 1..$#chrs ) {
my $char = $chrs[$i];
if ( $prev eq $char ) {
$count++;
next if $i < $#chrs;
}
if ( $count > 1) {
push @result, $count, $prev;
}
else {
push @result, $prev;
}
if ( ( $i == $#chrs ) and ( $prev ne $char ) ) {
push @result, $char;
last;
}
$count = 1;
$prev = $char;
}
return join '', @result;
}
sub compress_str {
my $result = '';
my $prev = substr $_[0], 0, 1;
my $count = 1;
my $lastind = (length $_[0]) - 1;
for my $i (1 .. $lastind) {
my $char = substr $_[0], $i, 1;
if ( $prev eq $char ) {
$count++;
next if $i < $lastind;
}
if ( $count > 1) {
$result .= $count;
}
$result .= $prev;
if ( ( $i == $lastind ) and ( $prev ne $char ) ) {
$result .= $char;
last;
}
$count = 1;
$prev = $char;
}
return $result;
}
其中compress_c.c
是:
SV *compress_c(SV* str_sv) {
STRLEN len;
char* str = SvPVbyte(str_sv, len);
SV* result = newSV(len);
char *buf = SvPVX(result);
char prev = str[0];
int count = 1;
int j = 0;
int i;
for (i = 1; i < len; i++ )
{
char cur = str[i];
if ( prev == cur ) {
count++;
if ( i < (len - 1) )
continue;
}
if ( count > 1) {
buf[j++] = count + '0'; // assume count is less than 10
}
buf[j++] = prev;
if ( (i == (len - 1)) && (prev != cur) ) buf[j++] = cur;
count = 1;
prev = cur;
}
buf[j] = '\0';
SvPOK_on(result);
SvCUR_set(result, j);
return result;
}
运行perl test.pl
:
Rate compress_array compress_str compress_regex compress_c
compress_array 311/s -- -42% -45% -99%
compress_str 533/s 71% -- -6% -98%
compress_regex 570/s 83% 7% -- -98%
compress_c 30632/s 9746% 5644% 5273% --
这表明正则表达式版本比字符串版本略快。但是,C版本速度最快,速度是正则表达式版本的50倍。
注意:我在我的Ubuntu 16.10笔记本电脑(英特尔酷睿i7-7500U CPU @ 2.70GHz)上进行了测试
答案 1 :(得分:2)
我已经通过多种方式执行了以下基准测试:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(cmpthese);
my $dna;
$dna .= [qw(G A T C)]->[rand 4] for 1 .. 10000;
sub frequency_concat {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result .= substr($dna, $idx, 1);
}
return $result;
}
sub frequency_join {
my $result = '';
for my $idx (0 .. length($dna) - 1) {
$result = join '', $result, substr($dna,$idx,1);
}
return $result;
}
sub frequency_list_push {
my @result = ();
for my $idx (0 .. length($dna) - 1) {
push @result, substr($dna,$idx,1);
}
return join '', @result;
}
sub frequency_list_prealloc {
my @result = (' ' x length($dna));
for my $idx (0 .. length($dna) - 1) {
$result[$idx] = substr($dna,$idx,1);
}
return join '', @result;
}
cmpthese(-1, # Run each for at least 1 second(s) {
concat => \&frequency_concat,
join => \&frequency_join,
list_push => \&frequency_list_push,
list_list_prealloc => \&frequency_list_prealloc
}
);
以下结果表明concat(a.b)是最快的操作。我不明白为什么,因为这需要制作字符串的几个副本。
Rate join list_push list_list_prealloc concat
join 213/s -- -38% -41% -74%
list_push 342/s 60% -- -5% -58%
list_list_prealloc 359/s 68% 5% -- -56%
concat 822/s 285% 140% 129% --