我正在使用Antlr4,这是我写的简化语法:
grammar BooleanExpression;
/*******************************
* Parser Rules
*******************************/
booleanTerm
: booleanLiteral (KW_OR booleanLiteral)+
| booleanLiteral
;
id
: IDENTIFIER
;
booleanLiteral
: KW_TRUE
| KW_FALSE
;
/*******************************
* Lexer Rules
*******************************/
KW_TRUE
: 'true'
;
KW_FALSE
: 'false'
;
KW_OR
: 'or'
;
IDENTIFIER
: (SIMPLE_LATIN)+
;
fragment
SIMPLE_LATIN
: 'A' .. 'Z'
| 'a' .. 'z'
;
WHITESPACE
: [ \t\n\r]+ -> skip
;
我使用了BailErrorStategy和BailLexer,如下所示:
public class BailErrorStrategy extends DefaultErrorStrategy {
/**
* Instead of recovering from exception e, rethrow it wrapped in a generic
* IllegalArgumentException so it is not caught by the rule function catches.
* Exception e is the "cause" of the IllegalArgumentException.
*/
@Override
public void recover(Parser recognizer, RecognitionException e) {
throw new IllegalArgumentException(e);
}
/**
* Make sure we don't attempt to recover inline; if the parser successfully
* recovers, it won't throw an exception.
*/
@Override
public Token recoverInline(Parser recognizer) throws RecognitionException {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
/** Make sure we don't attempt to recover from problems in subrules. */
@Override
public void sync(Parser recognizer) {
}
@Override
protected Token getMissingSymbol(Parser recognizer) {
throw new IllegalArgumentException(new InputMismatchException(recognizer));
}
}
public class BailLexer extends BooleanExpressionLexer {
public BailLexer(CharStream input) {
super(input);
//removeErrorListeners();
//addErrorListener(new ConsoleErrorListener());
}
@Override
public void recover(LexerNoViableAltException e) {
throw new IllegalArgumentException(e); // Bail out
}
@Override
public void recover(RecognitionException re) {
throw new IllegalArgumentException(re); // Bail out
}
}
除了一个案例外,一切正常。我尝试了以下表达式:
true OR false
我希望这个表达式被拒绝并抛出IllegalArgumentException,因为'或'标记应该是小写而不是大写。但事实证明Antlr4没有拒绝这个表达式,并且表达式被标记为“KW_TRUE IDENTIFIER KW_FALSE”(这是预期的,大写'OR'将被视为IDENTIFIER),但解析器没有抛出错误处理此令牌流并将其解析为仅包含“true”的树并丢弃剩余的“IDENTIFIER KW_FALSE”令牌。我尝试了不同的预测模式,但它们都像上面一样工作。我不知道为什么它会像这样工作并进行一些调试,最终导致Antlr中的这段代码:
ATNConfigSet reach = computeReachSet(previous, t, false);
if ( reach==null ) {
// if any configs in previous dipped into outer context, that
// means that input up to t actually finished entry rule
// at least for SLL decision. Full LL doesn't dip into outer
// so don't need special case.
// We will get an error no matter what so delay until after
// decision; better error message. Also, no reachable target
// ATN states in SLL implies LL will also get nowhere.
// If conflict in states that dip out, choose min since we
// will get error no matter what.
int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);
if ( alt!=ATN.INVALID_ALT_NUMBER ) {
// return w/o altering DFA
return alt;
}
throw noViableAlt(input, outerContext, previous, startIndex);
}
代码“int alt = getAltThatFinishedDecisionEntryRule(previousD.configs);”返回booleanTerm中的第二个替换(因为“true”匹配第二个替代“booleanLiteral”)但由于它不等于ATN.INVALID_ALT_NUMBER,因此不会立即抛出noViableAlt。根据那里的Java评论,“我们将得到一个错误,无论如何,所以延迟到决定之后”但似乎没有错误最终被抛出。
在这种情况下,我真的不知道如何让Antlr报告错误,有人可以解释一下吗?感谢任何帮助。谢谢。
答案 0 :(得分:5)
如果您的顶级规则没有以显式EOF
结尾,则不需要ANTLR解析到输入序列的末尾。它不是抛出异常,而是简单地解析你给它的序列的有效部分。
以下start
规则会强制它将整个输入序列解析为单个booleanTerm
。
start : booleanTerm EOF;
此外,BailErrorStrategy
由ANTLR 4运行时提供,并提供比示例中显示的信息更丰富的ParseCancellationException
。