Question

我正在使用正则表达式从弯曲括号（或＆＃34;括号＆＃34;）中提取数据，如从a,b中提取(a,b)，如下所示。我有一个文件，其中每一行都像

this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a2,b2) and [b2|a2]
this is the range of values (a3,b3) and [b3|a3]

我使用以下字符串来提取a1,b1，a2,b2等...

@numbers = $_ =~ /\((.*),(.*)\)/

但是，如果我想从方括号[]中提取数据，我该怎么办呢？例如

this is the range of values (a1,b1) and [b1|a1]
this is the range of values (a1,b1) and [b2|a2]

我只需要提取/匹配方括号中的数据而不是曲线括号。

Answer 1

[更新] 与此同时，我已撰写了一篇关于.*我在下面描述的具体问题的博文：Why Using .* in Regular Expressions Is Almost Never What You Actually Want

如果您的标识符a1，b1等从不包含逗号或方括号，则应使用以下行的模式以避免回溯地狱：

/\[([^,\]]+),([^,\]]+)\]/

这里是working example on Regex101。

像.*这样的贪婪量词的问题在于，你很可能在开始时消耗太多，以便正则表达式引擎必须进行大量的回溯。即使你使用非贪婪的量词，引擎也会做更多的匹配尝试，因为它一次只消耗一个字符，然后尝试提升模式中的位置。

（你甚至可以使用atomic groups来使匹配更加高效。）

Answer 2

#!/usr/bin/perl
# your code goes here
my @numbers;
while(chomp(my $line=<DATA>)){
    if($line =~ m|\[(.*),(.*)\]|){
    push @numbers, ($1,$2);
    }
}
print @numbers; 
__DATA__
this is the range of values [a1,b1]
this is the range of values [a2,b2]
this is the range of values [a3,b3]

Demo

Answer 3

您可以使用非贪婪量词*?

进行匹配

my @numbers = $_ =~ /\[(.*?),(.*?)\]/g;

或

my @numbers = /\[(.*?),(.*?)\]/g;

简称。

更新

my @numbers = /\[(.*?)\|(.*?)\]/g;

Answer 4

使用以下代码

$_ =~ /\[(.*?)\|(.*?)\]/g;

现在，如果模式成功匹配，则提取的值将存储在$1和$2中

Answer 5

我知道我来晚了一点，但是没有一个答案能正确回答OP的问题，并且确实能将整个问题与方括号[]匹配。显然，OP希望匹配括号内的内容。

将方括号内的所有内容与方括号匹配。 Example

\[[^\[\]]*]
要匹配方括号内的所有内容（不包括方括号本身），请使用正向和反向。 Example

(?<=\[)[^\[\]]*(?=\])

在方括号之间提取数据＆＃34; []＆＃34;使用Perl

5 个答案: