如何使用antlr4正确处理类似C语言的编译器中的错误?

时间:2018-11-16 22:18:11

标签: antlr grammar antlr4

我正在研究Dedalus编程语言(它是Gothic系列脚本lang)编译器,deadalus lang实际上具有与C lang类似的语法。我的编译器正在使用antlr4制作AST树。在这里,您可以查看我使用的语法:

grammar Daedalus;

// lexer
Const : 'const' | 'CONST';
Var: 'var' | 'VAR';
If : 'if' | 'IF';
Int: 'int' | 'INT';
Else: 'else' | 'ELSE';
Func: 'func' | 'FUNC';
String: 'string' | 'STRING';
Class: 'class' | 'CLASS';
Void: 'void' | 'VOID';
Return: 'return' | 'RETURN';
Float: 'float' | 'FLOAT';
Prototype: 'prototype' | 'PROTOTYPE';
Instance: 'instance' | 'INSTANCE';
Null: 'null' | 'Null';

Identifier : IdStart IdContinue*;
IntegerLiteral : Digit+;
FloatLiteral : PointFloat | ExponentFloat;
StringLiteral : '"' (~["\\\r\n] | '\\' (. | EOF))* '"';

  Whitespace : [ \t]+ -> skip;
Newline : ('\r''\n'?| '\n') -> skip;
BlockComment :   '/*' .*? '*/' -> skip;
LineComment :   '//' ~[\r\n]* -> skip ;

// fragments
fragment IdStart : GermanCharacter | [a-zA-Z_];
fragment IdContinue : IdStart | Digit;
fragment GermanCharacter : [\u00DF\u00E4\u00F6\u00FC]; //����
fragment Digit : [0-9];
fragment PointFloat : Digit* '.' Digit+ | Digit+ '.';
fragment ExponentFloat : (Digit+ | PointFloat) Exponent;
fragment Exponent : [eE] [+-]? Digit+;


//parser
daedalusFile: (( functionDef | constDef | varDecl | classDef | prototypeDef | instanceDef | instanceDecl )';')*?;

functionDef: Func typeReference nameNode parameterList statementBlock;
constDef: Const typeReference (constValueDef | constArrayDef) (',' (constValueDef | constArrayDef) )*;
classDef: Class nameNode '{' ( varDecl ';' )*? '}';
prototypeDef: Prototype nameNode '(' parentReference ')' statementBlock;
instanceDef: Instance nameNode '(' parentReference ')' statementBlock;
instanceDecl: Instance nameNode ( ',' nameNode )*? '(' parentReference ')';
varDecl: Var typeReference (varValueDecl | varArrayDecl) (',' (varValueDecl | varArrayDecl) )* ;

constArrayDef: nameNode '[' arraySize ']' constArrayAssignment;
constArrayAssignment: '=' '{' ( expressionBlock (',' expressionBlock)*? ) '}';

constValueDef: nameNode constValueAssignment;
constValueAssignment: '=' expressionBlock;

varArrayDecl: nameNode '[' arraySize ']';
varValueDecl: nameNode;

parameterList: '(' (parameterDecl (',' parameterDecl)*? )? ')';
parameterDecl: Var typeReference nameNode ('[' arraySize ']')?;
statementBlock: '{' ( ( (statement ';')  | ( ifBlockStatement ( ';' )? ) ) )*? '}';
statement: assignment | returnStatement | constDef | varDecl | expression;
funcCall: nameNode '(' ( funcArgExpression ( ',' funcArgExpression )*? )? ')';
assignment: referenceLeftSide assignmentOperator expressionBlock;
ifCondition: expressionBlock;
elseBlock: Else statementBlock;
elseIfBlock: Else If ifCondition statementBlock;
ifBlock: If ifCondition statementBlock;
ifBlockStatement: ifBlock ( elseIfBlock )*? ( elseBlock )?;
returnStatement: Return ( expressionBlock )?;

funcArgExpression: expressionBlock; // we use that to detect func call args
expressionBlock: expression; // we use that expression to force parser threat expression as a block

expression
  : '(' expression ')' #bracketExpression
| oneArgOperator expression #oneArgExpression
| expression multOperator expression #multExpression
| expression addOperator expression #addExpression
| expression bitMoveOperator expression #bitMoveExpression
| expression compOperator expression #compExpression
| expression eqOperator expression #eqExpression
| expression binAndOperator expression #binAndExpression
| expression binOrOperator expression #binOrExpression
| expression logAndOperator expression #logAndExpression
| expression logOrOperator expression #logOrExpression
| value #valExpression
;

arrayIndex : IntegerLiteral | referenceAtom;
arraySize : IntegerLiteral | referenceAtom;

value
  : IntegerLiteral #integerLiteralValue
| FloatLiteral #floatLiteralValue
| StringLiteral #stringLiteralValue
| Null #nullLiteralValue
| funcCall #funcCallValue
| reference #referenceValue
;

referenceAtom: Identifier ( '[' arrayIndex ']')?;
reference: referenceAtom ( '.' referenceAtom )?;
referenceLeftSide: referenceAtom ( '.' referenceAtom )?;

typeReference:  ( Identifier | Void | Int | Float | String | Func | Instance);

nameNode: Identifier;

parentReference: Identifier;

assignmentOperator:  '=' | '+=' | '-=' | '*=' | '/=';
addOperator: '+' | '-';
bitMoveOperator: '<<' | '>>';
compOperator: '<' | '>' | '<=' | '>=';
eqOperator: '==' | '!=';
oneArgOperator: '-' | '!' | '~' | '+';
multOperator: '*' | '/' | '%';
binAndOperator: '&';
binOrOperator: '|';
logAndOperator: '&&';
logOrOperator: '||';

(项目链接:https://github.com/dzieje-khorinis/DaedalusCompiler

让我们假设我要编译这样一个错误的代码(b赋值没有分号):

func void test() {
    b = 7
};

Antlr给我以下错误:

line 3:0 extraneous input '}' expecting ';' line 3:2 mismatched input '<EOF>' expecting {'}', '(', '+', '-', '!', '~', Const, Var, If, Return, Null, Identifier, IntegerLiteral, FloatLiteral, StringLiteral}

树: enter image description here

第一个错误对我来说是可以理解的,在第7个错误之后应该是;,也由于我们没有分号AST树而被打破,而下一个}被当作一部分语句块。但是我希望将};视为函数的结束元素,因此错误应该只说明x=5表达式的分号问题。

当我有以下代码时,还会发生另一个问题:

func void test() {

Antlr引发:

line 1:18 mismatched input '<EOF>' expecting {'}', '(', '+', '-', '!', '~', Const, Var, If, Return, Null, Identifier, IntegerLiteral, FloatLiteral, StringLiteral}

该错误对于我来说似乎很清楚,因为可以在func主体中定义表达式,但这不是必需的,antlr应该告诉我们应该在封闭函数的末尾放置};并使语法正确。

如何使antlr中的错误识别更智能?

0 个答案:

没有答案