具有负前瞻的Perl正则表达式意外地表现

时间:2013-04-10 22:14:51

标签: regex perl negative-lookahead

我试图匹配/ ezmlm-(任何单词除了' weed'或'返回')\ s + /与正则表达式。下面演示了一个执行正确操作的foreach循环,以及几乎可以执行的尝试正则表达式:

#!/usr/bin/perl
use strict;
use warnings;

my @tests = (
    {  msg => "want 'yes', string has ezmlm, but not weed or return",
       str => q[|/usr/local/bin/ezmlm-reject '<snip>'],
    },
    {  msg => "want 'yes', array  has ezmlm, but not weed or return",
       str => [ <DATA> ],
    },
    {  msg => "want 'no' , has ezmlm-weed",
       str => q[|/usr/local/bin/ezmlm-weed '<snip>'],
    },
    {  msg => "want 'no' , doesn't have ezmlm-anything",
       str => q[|/usr/local/bin/else '<snip>'],
    },
    {  msg => "want 'no' , ezmlm email pattern",
       str => q[crazy/but/legal/ezmlm-wacky@example.org],
    },
);

print "foreach regex\n";
foreach ( @tests ) {
    print doit_fe( ref $_->{str} ? @{$_->{str}} : $_->{str} ) ? "yes" : "no";
    print "\t";
    print doit_re( ref $_->{str} ? @{$_->{str}} : $_->{str} ) ? "yes" : "no";
    print "\t<--- $_->{msg}\n";
};

# for both of the following subs:
#   @_ will contain one or more lines of data
#   match the pattern /ezmlm-(any word except 'weed' or 'return')\s+/

sub doit_fe {
    my $has_ezmlm = 0;
    foreach ( @_ ) {
        next if $_ !~ m/ezmlm-(.*?)\s/;
        return 0 if $1 eq 'weed' or $1 eq 'return';
        $has_ezmlm++;
    };
    return $has_ezmlm;
};

sub doit_re { return grep /ezmlm-(?!weed|return)/, @_; };

__DATA__
|/usr/local/bin/ezmlm-reject '<snip>'
|/usr/local/bin/ezmlm-issubn '<snip>'
|/usr/local/bin/ezmlm-send '<snip>'
|/usr/local/bin/ezmlm-archive '<snip>'
|/usr/local/bin/ezmlm-warn '<snip>'

示例程序的输出如下:

foreach regex
yes yes <--- want 'yes', string has ezmlm, but not weed or return
yes yes <--- want 'yes', array  has ezmlm, but not weed or return
no  no  <--- want 'no' , has ezmlm-weed
no  no  <--- want 'no' , doesn't have ezmlm-anything
no  yes <--- want 'no' , ezmlm email pattern

在最后一个例子中,正则表达式失败,匹配一个愚蠢但合法的电子邮件地址。如果我修改正则表达式,在负向前瞻模式之后放置一个\ s,如下所示:

grep /ezmlm-(?!weed|return)\s+/

正则表达式根本无法匹配。我认为它与负面模式的工作方式有关。我试图让否定不贪婪,但似乎有一些教训埋没在perldoc perlre&#39;那是逃避我的。是否可以使用单个正则表达式执行此操作?

1 个答案:

答案 0 :(得分:4)

负面预测是零宽度,这意味着正则表达式

/ezmlm-(?!weed|return)\s+/
只有在"ezmlm-"后紧跟一个或多个空格字符时,

才会匹配。

模式

/ezmlm-(?!weed|return)/

将匹配

"crazy/but/legal/ezmlm-wacky@example.org"

因为它包含的"ezmlm-"后面没有"weedy""return"

尝试

/ezmlm-(?!weed|return)\S+\s+/

其中\S+是一个或多个非空格字符(如果您想要拒绝电子邮件地址,请使用[^@\s]+,即使后面跟空格也是如此)。