我的搜索文本如下。
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
它包含许多行(实际上是一个javascript文件),但需要解析变量 strings 中的值,即aaa,bbb,ccc,ddd,eee
以下是Perl代码,或者在底部使用PHP
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my @matches = $str =~ /(?:\"(.+?)\",?)/g;
print "@matches";
我知道上面的脚本会匹配所有时刻,但它也会解析其他行中的字符串(“xyz”)。所以我需要检查字符串 var strings =
/var strings = \[(?:\"(.+?)\",?)/g
使用上面的正则表达式,它将解析 aaa 。
/var strings = \[(?:\"(.+?)\",?)(?:\"(.+?)\",?)/g
使用上面的内容,将获得 aaa 和 bbb 。所以为了避免正则表达式的重复,我使用了'+'量词,如下所示。
/var strings = \[(?:\"(.+?)\",?)+/g
但我只得到了 eee ,所以我的问题是为什么我只使用'+'量词来获得 eee
更新1:使用PHP preg_match_all(这样做是为了获得更多关注:-))
$str = <<<STR
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR;
preg_match_all("/var strings = \[(?:\"(.+?)\",?)+/",$str,$matches);
print_r($matches);
更新2:为什么它匹配 eee ?因为(?:\"(.+?)\",?)+
的贪婪。删除贪婪/var strings = \[(?:\"(.+?)\",?)+?/
aaa 将匹配。 但为什么只有一个结果呢?使用单个正则表达式有什么办法可以实现吗?
答案 0 :(得分:2)
这是一个单正则表达式解决方案:
/(?:\bvar\s+strings\s*=\s*\[|\G,)\s*"([^"]*)"/g
\G
是一个零宽度断言,匹配前一个匹配结束的位置(如果是第一次匹配尝试,则匹配字符串的开头)。所以这就像:
var\s+strings\s*=\s*[\s*"([^"]*)"
......在第一次尝试时,然后:
,\s*"([^"]*)"
......之后,但每场比赛必须从最后一场比赛开始。
这是一个demo in PHP,但它也适用于Perl。
答案 1 :(得分:2)
您可能更喜欢这种首先使用var strings = [
修饰符查找字符串/g
的解决方案。这会将\G
设置为在[
之后立即匹配下一个正则表达式,该正则表达式会查找紧跟在双引号字符串后面的所有字符串,可能前面有逗号或空格。
my @matches;
if ($str =~ /var \s+ strings \s* = \s* \[ /gx) {
@matches = $str =~ /\G [,\s]* "([^"]+)" /gx;
}
尽管使用/g
修饰符,但您的正则表达式/var strings = \[(?:\"(.+?)\",?)+/g
仅匹配一次,因为没有第二次出现var strings = [
。每个匹配在匹配完成时返回捕获变量$1
,$2
,$3
等的值列表,并/(?:"(.+?)",?)+/
(无需逃避双引号)将多个值捕获到$1
中,只留下最终值。您需要编写类似上面的内容,每次匹配只会将$1
中的一个值捕获。{/ p>
答案 2 :(得分:1)
因为+
告诉它重复括号(?:"(.+?)",?)
内的确切内容一次或多次。因此它将匹配"eee"
字符串,然后查找重复的"eee"
字符串,它找不到。
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/var strings = \[(?:"(.+?)",?)+/)->explain();
The regular expression:
(?-imsx:var strings = \[(?:"(.+?)",?)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
var strings = 'var strings = '
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
(?: group, but do not capture (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
,? ',' (optional (matching the most amount
possible))
----------------------------------------------------------------------
)+ end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
一个更简单的例子是:
my @m = ('abcd' =~ m/(\w)+/g);
print "@m";
仅打印d
。这是由于:
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/(\w)+/)->explain();
The regular expression:
(?-imsx:(\w)+)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
( group and capture to \1 (1 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
)+ end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
如果您在捕获组上使用量词,则仅使用最后一个实例。
这是一种有效的方式:
my $str = <<STR;
...
...
var strings = ["aaa","bbb","ccc","ddd","eee"];
...
...
STR
my @matches;
$str =~ m/var strings = \[(.+?)\]/; # get the array first
my $jsarray = $1;
@matches = $array =~ m/"(.+?)"/g; # and get the strings from that
print "@matches";
<强>更新强>: 单行解决方案(虽然不是单一的正则表达式)将是:
@matches = ($str =~ m/var strings = \[(.+?)\]/)[0] =~ m/"(.+?)"/g;
但这是非常难以理解的imho。