使用grep -P

时间:2016-06-16 15:01:39

标签: regex linux grep pcre

我正在测试以下否定的lookbehind断言,我想了解结果:

echo "foo foofoo" | grep -Po '(?<!foo)foo'

打印出来

foo
foo
foo

我原本只想要打印两个第一个 foo ,'echo foo foo foo',但不是第三个,因为我的断言应该是指找到'foo'之前没有'foo'

我错过了什么?为什么第三个 foo 匹配?

注意:grep -P意味着将正则表达式解释为perl兼容的正则表达式。 grep -o表示只打印匹配的字符串。我的grep是2.5.1版本。

2 个答案:

答案 0 :(得分:1)

在对这个问题进行了大讨论之后(已经转移到聊天中)我得出的结论是,我对后观负面断言的理解是正确的:

echo "foo foofoo" | grep -Po '(?<!foo)foo'

应该返回 foo 两次。

我的grep版本,或者它编译的PCRE库,都是错误的。

有些人在他们的机器上使用不同版本的grep测试了这个命令,并且它们有不同的结果。有些人看过两个 foo ,其他人有三个 foo ,就像我一样。

tested that regex with Perl我得到了预期的结果,foo两次。

grep手册页指出 -P选项是实验性的

我的教训是:如果你想要真正有效的PCRE,请使用Perl。

答案 1 :(得分:0)

我无法重现这一点 - 运行确切的命令,我只获得两场比赛。

我使用的是GNU grep 2.6.3

但是,我发现了对正则表达式进行故障排除的有用技巧 - perl允许您运行regex debug

#!/usr/bin/env perl
use strict;
use warnings;

#dump results
use Data::Dumper;

#set regex indo debug mode
use re 'debug'; 

#iterate __DATA__ below
while ( <DATA> ) {
    #apply regex to current line
    my @matches = m/(?<!foo)(foo)/g;
    print Dumper \@matches;

}    

__DATA__
foo foofoo

这给我们输出:

Compiling REx "(?<!foo)(foo)"
Final program:
   1: UNLESSM[-3] (7)
   3:   EXACT <foo> (5)
   5:   SUCCEED (0)
   6: TAIL (7)
   7: OPEN1 (9)
   9:   EXACT <foo> (11)
  11: CLOSE1 (13)
  13: END (0)
anchored "foo" at 0 (checking anchored) minlen 3 
Matching REx "(?<!foo)(foo)" against "foo foofoo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [0..10] gave 0
  Found anchored substr "foo" at offset 0 (rx_origin now 0)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
   0 <> <foo foofoo>         |  1:UNLESSM[-3](7)
   0 <> <foo foofoo>         |  7:OPEN1(9)
   0 <> <foo foofoo>         |  9:EXACT <foo>(11)
   3 <foo> < foofoo>         | 11:CLOSE1(13)
   3 <foo> < foofoo>         | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against " foofoo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [3..10] gave 4
  Found anchored substr "foo" at offset 4 (rx_origin now 4)...
  (multiline anchor test skipped)
  try at offset...
Intuit: Successfully guessed: match at offset 4
   4 <foo > <foofoo>         |  1:UNLESSM[-3](7)
   1 <f> <oo foofoo>         |  3:  EXACT <foo>(5)
                                    failed...
   4 <foo > <foofoo>         |  7:OPEN1(9)
   4 <foo > <foofoo>         |  9:EXACT <foo>(11)
   7 <foo foo> <foo>         | 11:CLOSE1(13)
   7 <foo foo> <foo>         | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against "foo"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [7..10] gave 7
  Found anchored substr "foo" at offset 7 (rx_origin now 7)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 7
   7 <foo foo> <foo>         |  1:UNLESSM[-3](7)
   4 <foo > <foofoo>         |  3:  EXACT <foo>(5)
   7 <foo foo> <foo>         |  5:  SUCCEED(0)
                                    subpattern success...
                                  failed...
Match failed