为什么第一个print语句没有输出我期望的内容:
first = This is a test string, sec = This is a test string
由于*和+都是贪婪的,为什么内部*即在“(”中的第一个匹配中不占用整个字符串?
use strict;
use warnings;
my $string = "This is a test string";
$string =~ /((.*)*)/;
print "first = $1, sec = $2\n"; #prints "first = This is a test string, sec ="
$string =~ /((.+)*)/;
print "first = $1, sec = $2\n"; #prints "first = This is a test string, sec = This is a test string"
答案 0 :(得分:17)
在第一个正则表达式中.*
匹配两次。第一次匹配整个字符串。第二次匹配末尾的空字符串,因为.*
匹配空字符串时没有其他内容可供匹配。
其他正则表达式不会发生这种情况,因为.+
与空字符串不匹配。
编辑:至于其中的内容:$ 2将包含上次应用.*
/ .+
时匹配的内容。 $ 1将包含(.*)*
/ (.+)*
匹配的内容,即整个字符串。
答案 1 :(得分:14)
使用“use re 'debug'
”运行会导致:
Compiling REx "((.*)*)"
Final program:
1: OPEN1 (3)
3: CURLYX[0] {0,32767} (12)
5: OPEN2 (7)
7: STAR (9) # <====
8: REG_ANY (0)
9: CLOSE2 (11)
11: WHILEM[1/1] (0)
12: NOTHING (13)
13: CLOSE1 (15)
15: END (0)
minlen 0
Matching REx "((.*)*)" against "This is a test string"
0 <> <This is a > | 1:OPEN1(3)
0 <> <This is a > | 3:CURLYX[0] {0,32767}(12)
0 <> <This is a > | 11: WHILEM[1/1](0)
whilem: matched 0 out of 0..32767
0 <> <This is a > | 5: OPEN2(7)
0 <> <This is a > | 7: STAR(9) # <====
REG_ANY can match 21 times out of 2147483647...
21 < test string> <> | 9: CLOSE2(11)
21 < test string> <> | 11: WHILEM[1/1](0)
whilem: matched 1 out of 0..32767
21 < test string> <> | 5: OPEN2(7)
21 < test string> <> | 7: STAR(9) # <====
# This is where the outputs really start to diverge
# --------------------------------------------------------------------------------------------
REG_ANY can match 0 times out of 2147483647...
21 < test string> <> | 9: CLOSE2(11) # <==== Succeeded
21 < test string> <> | 11: WHILEM[1/1](0)
whilem: matched 2 out of 0..32767
whilem: empty match detected, trying continuation...
# --------------------------------------------------------------------------------------------
21 < test string> <> | 12: NOTHING(13)
21 < test string> <> | 13: CLOSE1(15)
21 < test string> <> | 15: END(0)
Match successful!
Compiling REx "((.+)*)"
Final program:
1: OPEN1 (3)
3: CURLYX[0] {0,32767} (12)
5: OPEN2 (7)
7: PLUS (9) # <====
8: REG_ANY (0)
9: CLOSE2 (11)
11: WHILEM[1/1] (0)
12: NOTHING (13)
13: CLOSE1 (15)
15: END (0)
minlen 0
Matching REx "((.+)*)" against "This is a test string"
0 <> <This is a > | 1:OPEN1(3)
0 <> <This is a > | 3:CURLYX[0] {0,32767}(12)
0 <> <This is a > | 11: WHILEM[1/1](0)
whilem: matched 0 out of 0..32767
0 <> <This is a > | 5: OPEN2(7)
0 <> <This is a > | 7: PLUS(9) # <====
REG_ANY can match 21 times out of 2147483647...
21 < test string> <> | 9: CLOSE2(11)
21 < test string> <> | 11: WHILEM[1/1](0)
whilem: matched 1 out of 0..32767
21 < test string> <> | 5: OPEN2(7)
21 < test string> <> | 7: PLUS(9) # <====
# This is where the outputs really start to diverge
# ------------------------------------------------------------------------------------
REG_ANY can match 0 times out of 2147483647...
failed... # <==== Failed
whilem: failed, trying continuation...
# ------------------------------------------------------------------------------------
21 < test string> <> | 12: NOTHING(13)
21 < test string> <> | 13: CLOSE1(15)
21 < test string> <> | 15: END(0)
Match successful!
答案 2 :(得分:3)
第一个正则表达式的问题是()*
仅保存最后一个匹配而.*
匹配空字符串(即没有)的事实的组合。所以,给定
"aaab" =~ /(.)*/;
$1
将为"b"
。如果将该行为与.*
匹配空字符串的事实相结合,您可以看到内部捕获有两个匹配:“这是一个测试字符串”和“”。由于空字符串排在最后,因此会保存到$2
。 $1
是整个捕获,因此它等同于"This is a test string" . ""
。第二种情况可以按预期工作,因为.+
与空字符串不匹配。
答案 3 :(得分:3)
我没有答案,但我确实有不同的方法来解决问题,使用更简单,也许更真实的正则表达式。
前两个示例的行为完全符合我的预期:.*
使用整个字符串,而正则表达式返回只包含一个元素的列表。但是第三个正则表达式返回一个包含2个元素的列表。
use strict;
use warnings;
use Data::Dumper;
$_ = "foo";
print Dumper( [ /^(.*)/g ] ); # ('foo') As expected.
print Dumper( [ /.(.*)/g ] ); # ('oo') As expected.
print Dumper( [ /(.*)/g ] ); # ('foo', '') Why?
到目前为止,许多答案都强调.*
会匹配任何内容。虽然这是真的,但这个反应并不是问题的核心,这就是:为什么正常表达式引擎在.*
消耗了整个字符串之后仍在寻找?在其他情况下(例如前两个示例),.*
不会引入额外的空字符串以进行测量。
在Chas的有用评论之后更新。欧文斯即可。对三个示例中的任何一个进行的第一次评估都会导致.*
匹配整个字符串。如果我们可以干预并在那一刻调用pos()
,那么引擎确实会在字符串的末尾(至少当我们看到字符串时;请参阅Chas的评论。以获得对此的更多见解)。但是,/g
选项告诉Perl再次尝试匹配整个正则表达式。示例#1和#2的第二次尝试将失败,并且该失败将导致引擎停止搜索。但是,使用正则表达式#3,引擎将获得另一个匹配:空字符串。然后/g
选项告诉引擎再次尝试整个模式。现在确实没有什么可以匹配 - 既不是常规字符也不是尾随空字符串 - 所以过程停止。