Perl smartmatch operator或grep来检查数组中是否存在值

时间:2013-11-25 23:44:24

标签: perl

在这里阅读了一个关于检查数组中是否存在值的最近的Perl问题后,我想到了如何做到这一点。我看到大多数人都推荐使用

形式的grep选项
if (!grep { $input_day eq $_ } @days ) {
    say "Grep Invalid Day";
}

然而,当我读到这个问题时,我首先跳到智能匹配运算符

unless ( $input_day ~~ @days ) {
    say "Smart Invalid Day";
}

所以它让我想知道使用grep而不是智能匹配是否有任何好处,反之亦然。我知道智能匹配仅在更高版本的Perl中可用,因此不适用于在5.10.1之前使用Perl版本的人。

我之前从未真正对基准代码进行过标记,因此以下代码是从在线示例编写的。我已经尝试过200万次智能匹配示例和200万次grep示例并记录时间。

use strict;
use warnings;
use v5.16.2;
use Benchmark;

my $input_day = shift;
my @days = qw /mon tue wed thu fri sat sun/;

my $smart_test_start = new Benchmark();
for(my $x=0; $x<10000000; $x++){
        unless ( $input_day ~~ @days ) {
                #here we would execute some code
        }
}
my $smart_test_end = new Benchmark();

my $grep_test_start = new Benchmark();
for(my $y=0; $y<10000000; $y++){
        if (!grep { $input_day eq $_ } @days ) {
                #here we would execute some code
        }
}
my $grep_test_end = new Benchmark();

my $smart_diff = timediff($smart_test_end, $smart_test_start);
my $grep_diff = timediff($grep_test_end, $grep_test_start);

say "SMART: ", timestr($smart_diff,'all');
say "GREP: ", timestr($grep_diff,'all');

我使用了一些不同的输入。

输入“mon”

SMART:  3 wallclock secs ( 2.75 usr  0.00 sys +  0.00 cusr  0.00 csys =  2.75 CPU)
GREP: 12 wallclock secs (12.02 usr  0.01 sys +  0.00 cusr  0.00 csys = 12.03 CPU)

输入“thu”

SMART:  6 wallclock secs ( 5.67 usr  0.00 sys +  0.00 cusr  0.00 csys =  5.67 CPU)
GREP: 11 wallclock secs (11.46 usr  0.01 sys +  0.00 cusr  0.00 csys = 11.47 CPU)

输入“sun”

SMART:  8 wallclock secs ( 8.87 usr  0.01 sys +  0.00 cusr  0.00 csys =  8.88 CPU)
GREP: 12 wallclock secs (11.62 usr  0.00 sys +  0.00 cusr  0.00 csys = 11.62 CPU)

输入“非”

SMART:  9 wallclock secs ( 8.46 usr  0.00 sys +  0.00 cusr  0.00 csys =  8.46 CPU)
GREP: 11 wallclock secs (11.58 usr  0.13 sys +  0.00 cusr  0.00 csys = 11.71 CPU)

在所有情况下,智能匹配运算符似乎比grep表现更好。查看结果,我假设在早期用例中这是因为智能匹配将在找到匹配时立即停止,因为grep将在匹配第一次匹配后继续检查数组的其余部分。

然后我看到其他人建议使用某些模块来查找第一个实例等。

是否有人不推荐智能匹配运营商?智能匹配是否存在某些限制或不可靠性?

3 个答案:

答案 0 :(得分:2)

请勿重复不要在生产代码中使用smartmatch运算符。根据{{​​3}},smartmatch已被标记为实验性的:

  

智能匹配,在v5.10.0中添加并在v5.10.1中进行了重大修订,一直是一个常规投诉点。尽管有许多方法可用,但它也证明了Perl的用户和实现者都存在问题和困惑。关于如何最好地解决问题,已经提出了许多建议。很明显,smartmatch几乎肯定会在将来改变或消失。不建议依赖其当前行为。

     

现在,当解析器看到〜,给定或何时发出警告。要禁用这些警告,可以将此行添加到适当的范围:

no if $] >= 5.018, "experimental::smartmatch";
  

但是,请考虑替换这些功能的使用,因为它们可能会在变得稳定之前再次改变行为。

这意味着在解决这些问题之前,依赖于此功能的代码不能被视为稳定。

答案 1 :(得分:2)

对此的正确解决方案使用哈希而不是数组

my %days = map { $_ => 1 } @days

然后你可以写

unless ($days{$input_day}) {
  say "Hash Invalid Day";
}

并且性能将远远超过任何其他解决方案。

(我希望这很明显,但你应该只设置一次哈希值,然后继续使用它进行所有测试。)

答案 2 :(得分:0)

我想做一些测试,以增加我的经验。我一直使用Smart Match,最近对它会产生的警告感到厌倦。

我有一个1亿个文本文件,包含10个字符串。

Perl脚本将STDIN转换为数组,并执行3种常见方法来查找数组中是否存在字符串。我按照上面的建议尝试使用哈希映射,但是与数组相比,哈希映射需要3倍的时间才能生成。如果您要对现有值进行大量测试,那么在某种程度上可以选择这种折衷方案,因为对哈希的存在性检查基本上是即时的。另外,这取决于您的数据源。

将来,我计划主要使用List :: Util(任意),因为它的未来证明是核心模块,并且性能稳定。

#!/usr/bin/perl
use List::Util qw(any);
my @arr = qw(a b c d e);
if ( any { $_ eq 'd' } @arr ) { 
    print "Found.\n";
}

方法:

List::Util (any): if ( any { $_ eq $a } @arr ) { do something. }
Perl Smartmatch: if ( $b ~~ @arr ) { do something. }
Grep: if ( grep { $c eq $_ } @arr ) { do something. }

我搜索了我知道存在于1,10,100,1000,10000,100000,1000000,10000000,100000000,100000000,100000000位置中的值。计时是通过Time :: HiRes模块完成的。

我发现的是,如果您的大多数值都在数组的开头,则smartmatch将执行List :: Utils方法。但是,如果大多数值在中间或结尾,或者在数组List :: Util中不存在,则将无法执行。不论是否找到值,grep似乎都进行了详尽的搜索。

更多输出详细信息:

Smart Match total: 5.939 seconds.
List::Util::any: 7.332 seconds.
Grep total: 39.553 seconds.

Array Generation Time: 30.315 seconds. Searching 100000000 arr elements.

any Searching eavTa2eWr1 any Found - eavTa2eWr1. Time: 0.540 seconds. any Searching mhEusMj5E7 any Found - mhEusMj5E7. Time: 0.358 seconds. any Searching WGwHfJICK6 any Found - WGwHfJICK6. Time: 0.364 seconds. any Searching I48fNDYNKF any Found - I48fNDYNKF. Time: 0.359 seconds. any Searching q3YVBTmX9J any Found - q3YVBTmX9J. Time: 0.357 seconds. any Searching pw0J5vRCnW any Found - pw0J5vRCnW. Time: 0.358 seconds. any Searching GNJP5flX5z any Found - GNJP5flX5z. Time: 0.392 seconds. any Searching 3Mh0x0R3OC any Found - 3Mh0x0R3OC. Time: 0.649 seconds. any Searching H5yxSA7eDx any Found - H5yxSA7eDx. Time: 3.473 seconds. List::Util::any: 6.850 seconds.

###############################################################

SM Searching eavTa2eWr1 SM Found eavTa2eWr1. Time: 0.000 seconds. SM Searching mhEusMj5E7 SM Found mhEusMj5E7. Time: 0.000 seconds. SM Searching WGwHfJICK6 SM Found WGwHfJICK6. Time: 0.000 seconds. SM Searching I48fNDYNKF SM Found I48fNDYNKF. Time: 0.000 seconds. SM Searching q3YVBTmX9J SM Found q3YVBTmX9J. Time: 0.001 seconds. SM Searching pw0J5vRCnW SM Found pw0J5vRCnW. Time: 0.005 seconds. SM Searching GNJP5flX5z SM Found GNJP5flX5z. Time: 0.054 seconds. SM Searching 3Mh0x0R3OC SM Found 3Mh0x0R3OC. Time: 0.519 seconds. SM Searching H5yxSA7eDx SM Found H5yxSA7eDx. Time: 5.083 seconds. Smart Match total: 5.662 seconds.

############################################################### Grep Searching eavTa2eWr1 Grep Found eavTa2eWr1. Time: 4.648 seconds. Grep Searching mhEusMj5E7 Grep Found mhEusMj5E7. Time: 4.546 seconds. Grep Searching WGwHfJICK6 Grep Found WGwHfJICK6. Time: 4.295 seconds. Grep Searching I48fNDYNKF Grep Found I48fNDYNKF. Time: 4.262 seconds. Grep Searching q3YVBTmX9J Grep Found q3YVBTmX9J. Time: 4.282 seconds. Grep Searching pw0J5vRCnW Grep Found pw0J5vRCnW. Time: 4.462 seconds. Grep Searching GNJP5flX5z Grep Found GNJP5flX5z. Time: 4.420 seconds. Grep Searching 3Mh0x0R3OC Grep Found 3Mh0x0R3OC. Time: 4.185 seconds. Grep Searching H5yxSA7eDx Grep Found H5yxSA7eDx. Time: 4.112 seconds. Grep total: 39.214 seconds. Done.

检查不存在的值。

List::Util::any: 28.980 seconds.
Grep total: 34.790 seconds.
Smart Match total: 42.913 seconds.

Array Generation Time: 30.909 seconds. Searching 100000000 arr elements.

any Searching eavTa2eWr1l Time: 3.264 seconds. any Searching mhEusMj5E7l Time: 3.404 seconds. any Searching WGwHfJICK6l Time: 3.291 seconds. any Searching I48fNDYNKFl Time: 3.240 seconds. any Searching q3YVBTmX9Jl Time: 3.083 seconds. any Searching pw0J5vRCnWl Time: 3.247 seconds. any Searching GNJP5flX5zl Time: 3.180 seconds. any Searching 3Mh0x0R3OCl Time: 3.028 seconds. any Searching H5yxSA7eDxl Time: 3.243 seconds. List::Util::any: 28.980 seconds.

###############################################################

SM Searching eavTa2eWr1l Time: 4.620 seconds. SM Searching mhEusMj5E7l Time: 4.783 seconds. SM Searching WGwHfJICK6l Time: 4.899 seconds. SM Searching I48fNDYNKFl Time: 4.902 seconds. SM Searching q3YVBTmX9Jl Time: 4.863 seconds. SM Searching pw0J5vRCnWl Time: 4.646 seconds. SM Searching GNJP5flX5zl Time: 4.751 seconds. SM Searching 3Mh0x0R3OCl Time: 4.666 seconds. SM Searching H5yxSA7eDxl Time: 4.782 seconds. Smart Match total: 42.913 seconds.

###############################################################

Grep Searching eavTa2eWr1l Time: 4.034 seconds. Grep Searching mhEusMj5E7l Time: 3.849 seconds. Grep Searching WGwHfJICK6l Time: 3.837 seconds. Grep Searching I48fNDYNKFl Time: 3.822 seconds. Grep Searching q3YVBTmX9Jl Time: 3.923 seconds. Grep Searching pw0J5vRCnWl Time: 3.825 seconds. Grep Searching GNJP5flX5zl Time: 3.994 seconds. Grep Searching 3Mh0x0R3OCl Time: 3.846 seconds. Grep Searching H5yxSA7eDxl Time: 4.174 seconds. Grep total: 35.303 seconds. Done.