在Marpa解析器中Scanless Interface(SLIF)的当前实现中,词法分析器似乎以下列方式执行最长令牌匹配(LTM):
当我的语法包含与最长子字符串匹配但不能在当前位置发生的标记时,这会产生令人沮丧的解析失败。请考虑以下代码:
#!/usr/bin/env perl
use strict; use warnings; use feature qw/say/; use utf8;
use Marpa::R2;
use Data::Dump;
my @data = ('! key : value', '! key:value');
my $grammar = Marpa::R2::Scanless::G->new({
source => \<<'END_GRAMMAR',
:default ::= action => [values]
:start ::= record
:discard ~ ws
ws ~ [\s]+
record ::= ('!') key (':') value
key ~ [\w]+
value ~ [^\s]+
END_GRAMMAR
});
for my $data (@data) {
my $recce = Marpa::R2::Scanless::R->new({
grammar => $grammar,
trace_terminals => 0, # set this to "1" to see how the tokens are recognized
});
$recce->read(\$data);
my $val = $recce->value // die "no parse";
say ">> $data";
dd $$val;
}
这会产生输出:
>> ! key : value
["key", "value"]
Error in SLIF G1 read: No lexemes accepted at position 2
* Error was at end of input
* String before error: ! key:value
Marpa::R2 exception at marpa.pl line 33.
预期产出:
>> ! key : value
["key", "value"]
>> ! key:value
["key", "value"]
识别出!
后,必须跟随key
令牌。在此位置的lexing期间,value
标记匹配最长的子字符串key:value
,尽管它不能出现在此位置。因此,解析失败。
问题:是否有可能在没有编写手册词法分析器的情况下实现预期输出?
(我知道词法分析器可以向识别器查询预期的令牌,并且可以将自己限制为只匹配这些令牌,但我不知道如何说服SLIF为我这样做。)
我在perl5 v16.2上运行Marpa :: R2 v2.064
根据Jeffrey Kegler的建议,我实现的规则总是匹配比普通value
更长的子字符串,因此是首选。使用pause
事件,然后我可以手动解析它,尽管我必须保持一个幻像规则以获得正确的语义。
这是完整的,更新的代码,包括。事件处理和更新的测试用例:
#!/usr/bin/env perl
use strict; use warnings; use feature qw/say/; use utf8;
use Marpa::R2;
use Data::Dump;
my @data = ('! key : value', '! key:value', '! key :value', '! key: value');
my $grammar = Marpa::R2::Scanless::G->new({
source => \<<'END_GRAMMAR',
:default ::= action => [values]
:start ::= Record
:discard ~ ws
ws ~ [\s]+
Record ::=
('!') Key (<Op colon>) Value # not directly used
| ('!') KeyValue
Key ~ key
Value ~ value
KeyValue~ key <ws any> ':' <ws any> value
:lexeme ~ KeyValue pause => before event => 'before KeyValue'
<Op colon> ~ ':'
key ~ [\w]+
value ~ [^\s]+
<ws any>~ [\s]*
END_GRAMMAR
});
my %events = (
'before KeyValue' => sub {
my ($recce, $string, $start, $length) = @_;
my ($k, $o, $v) = split /(\s*:\s*)/, $string, 2;
say STDERR qq(k="$k" o="$o" v="$v");
my $pos = $start;
$recce->lexeme_read('Key' => $pos, length($k), $k);
$pos += length $k;
$recce->lexeme_read('Op colon' => $pos, length($o), $o);
$pos += length $o;
$recce->lexeme_read('Value' => $pos, length($v), $v);
},
);
for my $data (@data) {
my $recce = Marpa::R2::Scanless::R->new({
grammar => $grammar,
trace_terminals => 0,
});
my $length = length $data;
for (
my $pos = $recce->read(\$data);
$pos < $length;
$pos = $recce->resume()
) {
say STDERR "pause";
my ($start, $length) = $recce->pause_span();
my $str = substr $data, $start, $length;
for my $event_data (@{ $recce->events }) {
my ($name) = @$event_data;
my $code = $events{$name} // die "no code for event $name";
$recce->$code($str, $start, $length);
}
}
my $val = $recce->value // die "no parse";
say ">> $data";
dd $$val;
}
这会产生
>> ! key : value
["key", "value"]
>> ! key:value
["key", "value"]
>> ! key :value
["key", "value"]
>> ! key: value
["key", "value"]
这是预期的行为。
答案 0 :(得分:6)
请注意,自版本2.079_015起,Marpa支持Longest Acceptable Tokens Matching的概念,这意味着只需添加:
lexeme default = forgiving => 1
你的语法会产生预期的输出。即:
#!env perl -w
use strict;
use Marpa::R2;
use Data::Dump;
use feature qw/say/;
my $grammar = Marpa::R2::Scanless::G->new({source => \do {local $/; <DATA>}});
my @data = ('! key : value', '! key:value', '! key :value', '! key: value');
foreach (@data) {
my $r = Marpa::R2::Scanless::R->new({grammar => $grammar});
$r->read(\$_);
my $val = $r->value;
say ">> $_"; dd $$val;
}
__DATA__
:default ::= action => [values]
lexeme default = forgiving => 1
:start ::= record
:discard ~ ws
ws ~ [\s]+
record ::= ('!') key (':') value
key ~ [\w]+
value ~ [^\s]+
会给:
>> ! key : value
["key", "value"]
>> ! key:value
["key", "value"]
>> ! key :value
["key", "value"]
>> ! key: value
["key", "value"]
答案 1 :(得分:2)
根据罗斯的建议复制评论:
您可以创建record ::= ('!') <complex record>
形式的规则,其中<complex record>
不包含空格和两个或更多冒号。
<complex record>
(使用pause_lexeme
或events
method检查暂停。resume
正常解析。