我正在使用以下正则表达式匹配并捕获字符串weather in foo bar
:
weather in ([a-z]+|[0-9]{5})\s?([a-zA-Z]+)?
哪个匹配并捕获,bar
是可选的,foo
可以是城市或拉链。
但是,我很乐意让用户写weather in foo for bar
,因为我自己不小心写了几次。有没有办法可以选择性地捕获像for
这样的文字字符串,而不必诉诸\s?f?o?r?\s?
?
答案 0 :(得分:6)
将其放入非捕获组:(?:\sfor\s)?
答案 1 :(得分:1)
要保持3个捕获组的完整性,需要多做一些工作 这可能有点先进,但这是断言有用的好例子。
/weather\s+in\s+([[:alpha:]]+|\d{5})\s*((?<=\s)for(?=\s|$)|)\s*((?<=\s)[[:alpha:]]+|)/
Perl中的测试用例:
use strict;
use warnings;
my @samples = (
'this is the weather in 12345 forever',
'this is the weather in 32156 for ',
'this is the weather in 32156 for today',
'this is the weather in abcdefghijk for',
'this is the weather in abcdefghijk ',
'this is the weather in abcdefghijk end',
);
my $regex = qr/
weather \s+ in \s+ # a literal string with some whitespace's
( # Group 1
[[:alpha:]]+ # City (alpha's), but without spaces
| \d{5} # OR, zip code (5 digits)
) # end group 1
\s* # optional whitespace's
( # Group 2
(?<=\s) # must be a whitespace behind us
for # literal 'for'
(?=\s|$) # in front of us must be a whitespace or string end
| # OR, match NOTHING
) # end group 2
\s* # optional whitespace's
( # Group 3
(?<=\s) # must be a whitespace behind us
[[:alpha:]]+ # 1 or more alpha's
| # OR, match NOTHING
) # end group 3
/x;
for (@samples) {
if (/$regex/x ) {
print "'$1', '$2', '$3'\n";
}
}
输出:
'12345', '', 'forever'
'32156', 'for', ''
'32156', 'for', 'today'
'abcdefghijk', 'for', ''
'abcdefghijk', '', ''
'abcdefghijk', '', 'end'