我正在测试以下否定的lookbehind断言,我想了解结果:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
打印出来
foo
foo
foo
我原本只想要打印两个第一个 foo ,'echo foo foo foo',但不是第三个,因为我的断言应该是指找到'foo'之前没有'foo'。
我错过了什么?为什么第三个 foo 匹配?
注意:grep -P意味着将正则表达式解释为perl兼容的正则表达式。 grep -o表示只打印匹配的字符串。我的grep是2.5.1版本。
答案 0 :(得分:1)
在对这个问题进行了大讨论之后(已经转移到聊天中)我得出的结论是,我对后观负面断言的理解是正确的:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
应该返回 foo 两次。
我的grep版本,或者它编译的PCRE库,都是错误的。
有些人在他们的机器上使用不同版本的grep测试了这个命令,并且它们有不同的结果。有些人看过两个 foo ,其他人有三个 foo ,就像我一样。
我tested that regex with Perl我得到了预期的结果,foo两次。
grep手册页指出 -P选项是实验性的。
我的教训是:如果你想要真正有效的PCRE,请使用Perl。
答案 1 :(得分:0)
我无法重现这一点 - 运行确切的命令,我只获得两场比赛。
我使用的是GNU grep 2.6.3
但是,我发现了对正则表达式进行故障排除的有用技巧 - perl
允许您运行regex debug
:
#!/usr/bin/env perl
use strict;
use warnings;
#dump results
use Data::Dumper;
#set regex indo debug mode
use re 'debug';
#iterate __DATA__ below
while ( <DATA> ) {
#apply regex to current line
my @matches = m/(?<!foo)(foo)/g;
print Dumper \@matches;
}
__DATA__
foo foofoo
这给我们输出:
Compiling REx "(?<!foo)(foo)"
Final program:
1: UNLESSM[-3] (7)
3: EXACT <foo> (5)
5: SUCCEED (0)
6: TAIL (7)
7: OPEN1 (9)
9: EXACT <foo> (11)
11: CLOSE1 (13)
13: END (0)
anchored "foo" at 0 (checking anchored) minlen 3
Matching REx "(?<!foo)(foo)" against "foo foofoo"
Intuit: trying to determine minimum start position...
doing 'check' fbm scan, [0..10] gave 0
Found anchored substr "foo" at offset 0 (rx_origin now 0)...
(multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
0 <> <foo foofoo> | 1:UNLESSM[-3](7)
0 <> <foo foofoo> | 7:OPEN1(9)
0 <> <foo foofoo> | 9:EXACT <foo>(11)
3 <foo> < foofoo> | 11:CLOSE1(13)
3 <foo> < foofoo> | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against " foofoo"
Intuit: trying to determine minimum start position...
doing 'check' fbm scan, [3..10] gave 4
Found anchored substr "foo" at offset 4 (rx_origin now 4)...
(multiline anchor test skipped)
try at offset...
Intuit: Successfully guessed: match at offset 4
4 <foo > <foofoo> | 1:UNLESSM[-3](7)
1 <f> <oo foofoo> | 3: EXACT <foo>(5)
failed...
4 <foo > <foofoo> | 7:OPEN1(9)
4 <foo > <foofoo> | 9:EXACT <foo>(11)
7 <foo foo> <foo> | 11:CLOSE1(13)
7 <foo foo> <foo> | 13:END(0)
Match successful!
Matching REx "(?<!foo)(foo)" against "foo"
Intuit: trying to determine minimum start position...
doing 'check' fbm scan, [7..10] gave 7
Found anchored substr "foo" at offset 7 (rx_origin now 7)...
(multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 7
7 <foo foo> <foo> | 1:UNLESSM[-3](7)
4 <foo > <foofoo> | 3: EXACT <foo>(5)
7 <foo foo> <foo> | 5: SUCCEED(0)
subpattern success...
failed...
Match failed