在Perl 6中使用正则表达式和.contains进行过滤

时间:2017-11-01 09:08:22

标签: perl6

我经常需要过滤array个字符串的元素,包含一些子字符串(例如一个字符)。由于可以通过匹配regex.contains方法来完成,因此我决定进行一项小测试以确保.contains更快(因此更合适) )。

my @array = "aa" .. "cc";
my constant $substr = 'a';

my $time1 = now;
my @a_array = @array.grep: *.contains($substr);
my $time2 = now;
@a_array = @array.grep: * ~~ /$substr/;
my $time3 = now;

my $time_contains = $time2 - $time1;
my $time_regex    = $time3 - $time2;
say "contains: $time_contains sec";
say "regex:    $time_regex sec";

然后我更改@array的大小和$substr的长度,并比较每种方法过滤@array所用的时间。在大多数情况下(正如预期的那样),.containsregex快得多,尤其是@array很大的情况。但是如果小@array(如上面的代码中所示)regex稍快一点。

contains: 0.0015010 sec
regex:    0.0008708 sec

为什么会这样?

1 个答案:

答案 0 :(得分:4)

In an entirely unscientific experiment I just switched the regex version and the contains version around and found that the difference in performance you're measuring is not "regex vs contains" but in fact "first thing versus second thing":

When contains comes first:

contains: 0.001555  sec
regex:    0.0009051 sec

When regex comes first:

regex:    0.002055 sec
contains: 0.000326 sec

Benchmarking properly is a difficult task. It's really easy to accidentally measure something different from what you wanted to figure out.

When I want to compare the performance of multiple things I will usually run each thing in a separate script, or maybe have a shared script but only run one of the tasks at once (for example using a multi sub MAIN("task1") approach). That way any startup work gets shared.

In the #perl6 IRC channel on freenode we have a bot called benchable6 which can do benchmarks for you. Read the section "Comparing Code" on its wiki page的属性,以了解如何为您比较两段代码。