我想将逗号分隔的参数列表拆分为标记,但是如果在双引号或括号内,我想忽略分隔符。例如:
my @arr = some_function('one, "string with ,", func(a,func2(1,2))');
应该产生:
$arr[0] -> one
$arr[1] -> "string with ,"
$arr[2] -> func(a,func2(1,2))
我知道我可以忽略Text::ParseWords
引号内的逗号,但仍会将func(a,func2(1,2))
分成多个字段,因为它没有引用。有没有一种干净的方法可以做到这一点,还是我必须编写自己的解析器?
答案 0 :(得分:3)
您可以使用Parse::RecDescent执行此操作,这可以让您定义用于解析的语法:
use strict;
use warnings 'all';
use 5.010;
use Data::Dumper;
use Parse::RecDescent;
use Regexp::Common qw(balanced);
my $grammar = q{
# One or more fields, separated by commas
startrule : field(s /,/) # / for broken Stack Overflow syntax highlighter
# A field can be a function call, a double-quoted string, or bare text
field : func
| quoted
| bare
# A double-quoted string. Returned with quotes stripped
quoted : /"[^"]*"/
{
$item[-1] =~ s/\A"|"\z//g; # / for broken Stack Overflow syntax highlighter
$return = $item[-1]
}
# "Bare" text: not a function call and not a quoted string. May contain
# spaces
bare : /[^,]*/
# A function name
identifier : /\w+/
};
$grammar .= qq{
# A function call
func : identifier /$RE{balanced}{-parens=>'()'}/
};
$grammar .= q{
{ $return = join '', @item[1..$#item] }
};
my $parser = Parse::RecDescent->new($grammar) or die 'Bad grammar';
my $parsed = $parser->startrule(
'one two, "string with ,", func(a,func2(1,2))'
);
print Dumper $parsed;
输出:
$VAR1 = [
'one two',
'string with ,',
'func(a,func2(1,2))'
];
请注意,这不会处理包含转义引号的带引号的字段,但如果您知道哪个字符用于转义,则很容易添加。