Question

Antlr4中的ErrorListener机制非常适合在解析期间记录和决定语法错误，但在解析完成后，它可以更好地处理批处理错误。解析完成后，您可能希望处理错误的原因有很多，包括：

我们需要一种干净的方法来在解析过程中以编程方式检查错误并在事后处理它们，
有时一个语法错误导致其他几个（例如，当没有在行中恢复时），因此在向用户显示输出时按父上下文对这些错误进行分组或嵌套会很有帮助，并且您无法全部了解解析完成之前的错误，
您可能希望以不同的方式向用户显示错误，具体取决于它们的数量和严重程度，例如，退出规则的单个错误或全部恢复的一些错误可能只是要求用户修复这些错误区域 - 否则，您可能让用户编辑整个输入，并且您需要具有所有错误才能做出此决定。

最重要的是，如果我们知道发生错误的完整上下文（包括其他错误），我们可以更聪明地报告并要求用户修复语法错误。为此，我有以下三个目标：

来自给定解析的所有错误的完整集合，
每个错误的上下文信息，
每个错误的严重性和恢复信息。

我已编写代码来做＃1和＃2，我正在寻找＃3的帮助。我还建议做一些小改动，让每个人都更容易＃1和＃2。

首先，为了完成＃1（完整的错误集合），我创建了CollectionErrorListener，如下所示：

public class CollectionErrorListener extends BaseErrorListener {

    private final List<SyntaxError> errors = new ArrayList<SyntaxError>();

    public List<SyntaxError> getErrors() {
        return errors;
    }

    @Override
    public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
        if (e == null) {
            // e is null when the parser was able to recover in line without exiting the surrounding rule.
            e = new InlineRecognitionException(msg, recognizer, ((Parser)recognizer).getInputStream(), ((Parser)recognizer).getContext(), (Token) offendingSymbol);
        }
        this.errors.add(new SyntaxError(msg, e));
    }  
}

这是我的InlineRecognitionException类：

public class InlineRecognitionException extends RecognitionException {

    public InlineRecognitionException(String message, Recognizer<?, ?> recognizer, IntStream input, ParserRuleContext ctx, Token offendingToken) {
        super(message, recognizer, input, ctx);
        this.setOffendingToken(offendingToken);
    }    
}

这是我的SyntaxError容器类：

public class SyntaxError extends RecognitionException {

    public SyntaxError(String message, RecognitionException e) {
        super(message, e.getRecognizer(), e.getInputStream(), (ParserRuleContext) e.getCtx());
        this.setOffendingToken(e.getOffendingToken());
        this.initCause(e);
    }
}

这与280Z28对Antlr error/exception handling的回答所引用的SyntaxErrorListener非常相似。我需要InlineRecognitionException和SyntaxError包装器，因为如何填充CollectionErrorListener.syntaxError的参数。

首先，RecognitionException参数＆＃34; e＆＃34;如果解析器从行中的异常中恢复（不离开规则），则为null。我们不能实例化一个新的RecognitionException，因为没有允许我们设置违规令牌的构造函数或方法。无论如何，能够区分在线恢复的错误（使用instanceof测试）是实现目标＃3的有用信息，因此我们可以使用InlineRecognitionException类来指示在线恢复。

接下来，我们需要SyntaxError包装类，因为，即使在RecognitionException＆＃34; e＆＃34;不为空（例如，当恢复不在行中时），e.getMessage（）的值为空（由于某种未知原因）。因此，我们需要将msg参数存储到CollectionErrorListener.syntaxError。因为在RecognitionException上没有setMessage（）修饰符方法，并且我们不能实例化一个新的RecognitionException（我们丢失了上一段中讨论的违规令牌信息），所以我们留下了子类以便能够设置消息，违规令牌，以及适当的原因。

这种机制非常有效：

    CollectionErrorListener collector = new CollectionErrorListener();
    parser.addErrorListener(collector);
    ParseTree tree = parser.prog();

    //  ...  Later ...
    for (SyntaxError e : collector.getErrors()) {
        // RecognitionExceptionUtil is my custom class discussed next.
        System.out.println(RecognitionExceptionUtil.formatVerbose(e));
    }

这是我的下一点。格式化RecognitionException的输出有点烦人。 The Definitive ANTLR 4 Reference书的第9章显示了如何显示质量错误消息意味着您需要拆分输入行，反转规则调用堆栈，并将违规令牌中的大量内容拼凑在一起以解释错误发生的位置。并且，如果您在解析完成后报告错误，则以下命令不起作用：

// The following doesn't work if you are not reporting during the parse because the
// parser context is lost from the RecognitionException "e" recognizer.
List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack();

问题是我们丢失了RuleContext，而getRuleInvocationStack则需要这样做。幸运的是，RecognitionException保留了我们上下文的副本，getRuleInvocationStack接受了一个参数，所以这里是我们在解析完成后得到规则调用堆栈的方法：

// Pass in the context from RecognitionException "e" to get the rule invocation stack
// after the parse is finished.
List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack(e.getCtx());

一般来说，如果我们在RecognitionException中有一些便利方法来使错误报告更友好，那将会特别好。这是我第一次尝试可以成为RecognitionException的一部分的实用程序类：

public class RecognitionExceptionUtil {

    public static String formatVerbose(RecognitionException e) {
        return String.format("ERROR on line %s:%s => %s%nrule stack: %s%noffending token %s => %s%n%s",
                getLineNumberString(e),
                getCharPositionInLineString(e),
                e.getMessage(),
                getRuleStackString(e),
                getOffendingTokenString(e),
                getOffendingTokenVerboseString(e),
                getErrorLineStringUnderlined(e).replaceAll("(?m)^|$", "|"));
    }

    public static String getRuleStackString(RecognitionException e) {
        if (e == null || e.getRecognizer() == null
                || e.getCtx() == null
                || e.getRecognizer().getRuleNames() == null) {
            return "";
        }
        List<String> stack = ((Parser)e.getRecognizer()).getRuleInvocationStack(e.getCtx());
        Collections.reverse(stack);
        return stack.toString();
    }

    public static String getLineNumberString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("%d", e.getOffendingToken().getLine());
    }

    public static String getCharPositionInLineString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("%d", e.getOffendingToken().getCharPositionInLine());
    }

    public static String getOffendingTokenString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return e.getOffendingToken().toString();
    }

    public static String getOffendingTokenVerboseString(RecognitionException e) {
        if (e == null || e.getOffendingToken() == null) {
            return "";
        }
        return String.format("at tokenStream[%d], inputString[%d..%d] = '%s', tokenType<%d> = %s, on line %d, character %d",
                e.getOffendingToken().getTokenIndex(),
                e.getOffendingToken().getStartIndex(),
                e.getOffendingToken().getStopIndex(),
                e.getOffendingToken().getText(),
                e.getOffendingToken().getType(),
                e.getRecognizer().getTokenNames()[e.getOffendingToken().getType()],
                e.getOffendingToken().getLine(),
                e.getOffendingToken().getCharPositionInLine());
    }

    public static String getErrorLineString(RecognitionException e) {
        if (e == null || e.getRecognizer() == null
                || e.getRecognizer().getInputStream() == null
                || e.getOffendingToken() == null) {
            return "";
        }
        CommonTokenStream tokens =
            (CommonTokenStream)e.getRecognizer().getInputStream();
        String input = tokens.getTokenSource().getInputStream().toString();
        String[] lines = input.split(String.format("\r?\n"));
        return lines[e.getOffendingToken().getLine() - 1];
    }

    public static String getErrorLineStringUnderlined(RecognitionException e) {
        String errorLine = getErrorLineString(e);
        if (errorLine.isEmpty()) {
            return errorLine;
        }
        // replace tabs with single space so that charPositionInLine gives us the
        // column to start underlining.
        errorLine = errorLine.replaceAll("\t", " ");
        StringBuilder underLine = new StringBuilder(String.format("%" + errorLine.length() + "s", ""));
        int start = e.getOffendingToken().getStartIndex();
        int stop = e.getOffendingToken().getStopIndex();
        if ( start>=0 && stop>=0 ) {
            for (int i=0; i<=(stop-start); i++) {
                underLine.setCharAt(e.getOffendingToken().getCharPositionInLine() + i, '^');
            }
        }
        return String.format("%s%n%s", errorLine, underLine);
    }
}

我的RecognitionExceptionUtil还有很多需要（总是返回字符串，不检查识别器是否为Parser类型，不处理getErrorLineString中的多行等），但我希望你明白这一点。 / p>

我对ANTLR未来版本的建议摘要：

始终填充＆＃34; RecognitionException e＆＃34; ANTLRErrorListener.syntaxError的参数（包括OffendingToken），以便我们可以在解析后收集批处理的这些异常。在你的时候，确保将e.getMessage（）设置为返回当前msg参数中的值。
为RecognitionException添加一个包含OffendingToken的构造函数。
删除ANTLRErrorListener.syntaxError的方法签名中的其他参数，因为它们将是无关的并导致混淆。
在RecognitionException中为常见的东西添加便利方法，例如getCharPositionInLine，getLineNumber，getRuleStack，以及我上面定义的RecognitionExceptionUtil类中的其他东西。当然，这些必须检查null并检查识别器是否为某些方法的Parser类型。
调用ANTLRErrorListener.syntaxError时，克隆识别器，以便在解析完成时我们不会丢失上下文（我们可以更轻松地调用getRuleInvocationStack）。
如果克隆识别器，则不需要在RecognitionException中存储上下文。我们可以对e.getCtx（）进行两处更改：首先，将其重命名为e.getContext（）以使其与Parser.getContext（）一致，其次，使其成为我们在RecognitionException中已有的识别器的便捷方法（检查识别器是Parser的一个实例。）
在RecognitionException中包含有关错误严重性以及解析器恢复方式的信息。这是我从一开始的目标＃3。通过解析器处理它的程度对语法错误进行分类会很棒。这个错误是否会破坏整个解析或只是显示为一个直线？跳过/插入了多少和哪些令牌？

所以，我正在寻找有关我的三个目标的反馈，特别是有关收集有关目标＃3的更多信息的任何建议：每个错误的严重性和恢复信息。

Answer 1

我将这些建议发布到Antlr4 GitHub问题列表并收到以下回复。我相信ANTLRErrorListener.syntaxError方法包含冗余/混淆参数，需要大量的API知识才能正确使用，但我理解这个决定。以下是问题的链接和文本回复的副本：

来自：https://github.com/antlr/antlr4/issues/396

关于你的建议：

将RecognitionException e参数填充到syntaxError：如文档中所述：

除外，所有语法错误的RecognitionException都为非null 我们发现了不匹配的令牌错误，我们可以从线上恢复，没有从周围规则返回（通过单个令牌插入和删除机制）。

使用违规令牌向RecognitionException添加构造函数：这与此问题无关，并且将单独解决（如果有的话）。
从syntaxError删除参数：这不仅会为在ANTLR 4的先前版本中实现此方法的用户引入重大更改，但是它将无法报告内联发生的错误的可用信息（即错误，其中没有可以使用RecognitionException）。
RecognitionException中的便捷方法：这与此问题无关，并且将单独解决（如果有的话）。（进一步说明：记录API非常困难。这只是增加了更多方法来处理已经易于访问的内容，所以我反对这种改变。）
在调用syntaxError时克隆识别器：这是一个性能关键的方法，因此只有在绝对必要时才会创建新对象。
“如果克隆识别器”：在调用syntaxError之前永远不会克隆识别器。
如果您的应用需要，可以将此信息存储在ANTLRErrorListener和/或ANTLRErrorStrategy实现中的关联映射中。

我现在正在关闭此问题，因为我没有看到任何需要从此列表中更改运行时的操作项。

通过ErrorListener累积/收集错误以在Parse之后处理

1 个答案: