如何使ANTLR规则“贪婪”?

时间:2015-07-16 12:16:42

标签: .net parsing grammar antlr4 context-free-grammar

我最近开始使用ANTLR为插值字符串生成一个简单的解析器。下面是一些输入字符串示例(每行一个):

Hello {User.Name}!
Welcome on Planet {GetPlanetName(" A stupid string param :-} ")}
Plain String without an interpolated expression
String with escaped {{ brackets }}

决定某事是字符串(plainString)还是表达式(expressionString)的语法如下:

grammar T;

patternString:                  (plainString | expressionString)+
                                ;

plainString:                    (CBO_ESCAPESEQUENCE | CBC_ESCAPESEQUENCE | PLAINSTRINGLITERAL)+
                                ;

expressionString:               CBO expression CBC | CURLYBRACKETS_EMPTY
                                ;

expression:                     expressionSegment+
                                ;

expressionSegment:              ~('"' | '\'' | '{' | '(' | '[' | '}' | ')' | ']' | CBO_ESCAPESEQUENCE | CBC_ESCAPESEQUENCE)+
                                | '(' expressionSegment+ ')' | '(' WS ')' | '()'
                                | '[' expressionSegment+ ']' | '[' WS ']' | '[]'
                                | '{' expressionSegment+ '}' | CURLYBRACKETS_EMPTY
                                | stringLiteral
                                | charLiteral
                                ;

stringLiteral:                  '"' (~('"') | '\\"')+ '"'
                                | '""'
                                ;

charLiteral:                    '\'' (~('\'') | '\\\'')+ '\''
                                ;

fragment WS:                    (' ' | '\r' | '\n' | '\t')+;

PLAINSTRINGLITERAL:             ~('{' | '}');
CURLYBRACKETS_EMPTY:            (CBO WS CBC | CBO CBC);
CBO:                            '{';
CBC:                            '}';

fragment CBO_ESCAPESEQUENCE:    '{{';
fragment CBC_ESCAPESEQUENCE:    '}}';

除了以下一些字符串之外,这项工作非常顺利:

{{{new [] {1,2,3,4}}}}

这给了我以下AST

PatternString                                 => '{{{new[]{1, 2, 3, 4}}}}'
    ExpressionString                          => '{{{new[]{1, 2, 3, 4}}}}'
        Expression                            => '{{new[]{1, 2, 3, 4}}}'
            ExpressionSegment                 => '{{new[]{1, 2, 3, 4}}}'
                ExpressionSegment             => '{new[]{1, 2, 3, 4}}'
                    ExpressionSegment         => 'new[]'
                    ExpressionSegment         => '{1, 2, 3, 4}'
                        ExpressionSegment     => '1, 2, 3, 4'

虽然我期望(并希望)以下AST:

PatternString                                 => '{{{new[]{1, 2, 3, 4}}}}'
    PlainString                               => '{{'
    ExpressionString                          => '{new[]{1, 2, 3, 4}}'
        Expression                            => 'new[]{1, 2, 3, 4}'
            ExpressionSegment                 => 'new[]'
            ExpressionSegment                 => '{1, 2, 3, 4}'
                ExpressionSegment             => '1, 2, 3, 4'
    PlainString                               => '}}'

意思是, plainString应该更贪婪并尽可能多地使用转义括号。我怎样才能在上面的语法中解决这个问题?

1 个答案:

答案 0 :(得分:3)

我认为您的问题是由于明确定义了打开和关闭花括号的规则,然后在一些解析器规则中将它们作为字符串文字引用。通过将表达式段规则修改为仅引用词法分析器规则,问题似乎得到解决。 请试用这个语法,看看你的问题是否已修复

expressionString:               CBO expression CBC | CURLYBRACKETS_EMPTY
                                ;

expression:                     expressionSegment+
                                ;

expressionSegment:
                                  L_PAREN expressionSegment+ R_PAREN
                                | L_BRACKET expressionSegment+ R_BRACKET
                                | CBO expressionSegment+ CBC
                                | L_PAREN WS R_PAREN
                                | L_BRACKET WS R_BRACKET
                                | L_PAREN R_PAREN
                                | L_BRACKET R_BRACKET
                                | CURLYBRACKETS_EMPTY
                                | stringLiteral
                                | charLiteral
                                | ~(DOUBLE_QUOTE | SINGLE_QUOTE | CBC | CBO | L_PAREN | L_BRACKET | R_PAREN | R_BRACKET)+
                                ;

stringLiteral:                  '"' (~('"') | '\\"')+ '"'
                                | '""'
                                ;

charLiteral:                    '\'' (~('\'') | '\\\'')+ '\''
                                ;

WS:                    (' ' | '\r' | '\n' | '\t')+;

PLAINSTRINGLITERAL:             ~('{' | '}');
CURLYBRACKETS_EMPTY:            (CBO WS CBC | CBO CBC);
CBO:                            '{';
CBC:                            '}';
L_PAREN: '(';
R_PAREN: ')';
L_BRACKET: '[';
R_BRACKET: ']';
SINGLE_QUOTE: '\'';
DOUBLE_QUOTE: '"';

如您所见,解析树似乎反映了您正在寻找的内容

enter image description here