从Antlr 3升级到Antlr 4后,为什么解析失败?

时间:2017-04-14 15:39:16

标签: migration antlr upgrade antlr4

最近我正在尝试将我的项目从Antlr3升级到Antlr4。但是在对语法文件进行更改后,似乎以前工作的方程式不再有效。我是Antlr4的新手,所以无法理解我的改变是否破坏了。

这是我的原始语法文件:

grammar equation;
options {
    language=CSharp2;
    output=AST;
    ASTLabelType=CommonTree;
}   

tokens {
    VARIABLE;  
    CONSTANT;  
    EXPR;
    PAREXPR;
    EQUATION;
    UNARYEXPR;
    FUNCTION;
    BINARYOP;
    LIST;
}


equationset:    equation* EOF!;
equation:   variable ASSIGN expression -> ^(EQUATION variable expression)
    ;

parExpression 
    :   LPAREN expression RPAREN -> ^(PAREXPR expression)
    ;

expression
    :   conditionalexpression -> ^(EXPR conditionalexpression)
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR^ andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND^ comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ^ | NE^ | LTE^ | GTE^ | LT^ | GT^) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS^ | MINUS^) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES^ | DIVIDE^) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression -> ^(UNARYEXPR NOT unaryExpression)
    |   MINUS unaryExpression  -> ^(UNARYEXPR MINUS unaryExpression)
    | exponentexpression;

exponentexpression
    :   primary (CARET^ primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING -> ^(CONSTANT STRING) | numeric -> ^(CONSTANT numeric);
booleantok  :   BOOLEAN -> ^(BOOLEAN);
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER -> IDENTIFIER+;
function
    :   scopedidentifier LPAREN argumentlist RPAREN -> ^(FUNCTION scopedidentifier argumentlist);
variable:   scopedidentifier -> ^(VARIABLE scopedidentifier);

argumentlist:   (expression) ? (COMMA! expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ {$channel=HIDDEN;};

COMMENT :   '/*' .* '*/' {$channel=HIDDEN;};

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;};

STRING: (('\"') ( (~('\"')) )* ('\"'))+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';

以下是我改变后的情况:

grammar equation;
options {

}   

tokens {
    VARIABLE;  
    CONSTANT;  
    EXPR;
    PAREXPR;
    EQUATION;
    UNARYEXPR;
    FUNCTION;
    BINARYOP;
    LIST;
}


equationset:    equation* EOF;
equation:   variable ASSIGN expression
    ;

parExpression 
    :   LPAREN expression RPAREN
    ;

expression
    :   conditionalexpression
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression
    |   MINUS unaryExpression
    | exponentexpression;

exponentexpression
    :   primary (CARET primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING | numeric;
booleantok  :   BOOLEAN;
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER;
function
    :   scopedidentifier LPAREN argumentlist RPAREN;
variable:   scopedidentifier;

argumentlist:   (expression) ? (COMMA expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);

COMMENT :   '/*' .* '*/' ->channel(HIDDEN);

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);

STRING: (('\"') ( (~('\"')) )* ('\"'))+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';

我试图解析的一个示例等式(以前工作正常)是:

[a].[b] = 1.76 * [Product_DC].[PDC_Inbound_Pallets] * if(product_dc.[PDC_DC] =="US84",1,0)

提前致谢。

1 个答案:

答案 0 :(得分:0)

  • 标记应以逗号,列出,而不是分号;。另见官方文档中的Token Section段。
  • 由于双引号转义不需要ANTLR 4.7反斜杠。 STRING: (('\"') ( (~('\"')) )* ('\"'))+;应改写为STRING: ('"' ~'"'* '"')+;
  • 您在多线注释令牌中错过了非贪婪匹配的问号:'/*' .* '*/' - &gt; '/*' .*? '*/'

所以,固定语法看起来像这样:

grammar equation;

options {

}   

tokens {
    VARIABLE,
    CONSTANT,
    EXPR,
    PAREXPR,
    EQUATION,
    UNARYEXPR,
    FUNCTION,
    BINARYOP,
    LIST
}


equationset:    equation* EOF;
equation:   variable ASSIGN expression
    ;

parExpression 
    :   LPAREN expression RPAREN
    ;

expression
    :   conditionalexpression
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression
    |   MINUS unaryExpression
    | exponentexpression;

exponentexpression
    :   primary (CARET primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING | numeric;
booleantok  :   BOOLEAN;
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER;
function
    :   scopedidentifier LPAREN argumentlist RPAREN;
variable:   scopedidentifier;

argumentlist:   (expression) ? (COMMA expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);

COMMENT :   '/*' .*? '*/' -> channel(HIDDEN);

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);

STRING: ('"' ~'"'* '"')+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';