更新

Question

我正在使用ANTLR4尝试解析带有星号开头的注释的代码，例如：

* This is a comment

我最初遇到乘法表达式被误认为是这些注释的问题，因此决定制定我的词法分析器规则：

LINE_COMMENT : '\r\n' '*' ~[\r\n]* ;

这将强制使用换行符，因此看不到2 * 3，并以'* 3'作为注释。直到我有以第一行的注释开头的代码（该代码没有换行符），此方法才能正常工作。例如：

* This is the first line of the code's file\r\n
* This is the second line of the codes's file\r\n

我也尝试过{getCharPositionInLine == x}吗？以确保仅在当前行中有星号或空格/制表符时才识别注释。在使用时有效

antlr4 *.g4

，但不适用于使用

生成的JavaScript解析器

antlr4 -Dlanguage=JavaScript *.g4

有没有办法获得{getCharPositionInLine == x}的相同结果？我的JavaScript解析器或某种防止乘法被识别为注释的方法？我还应该提到，这种编码语言在行尾不使用分号。

我尝试过这种简单的语法，但是我没有任何运气。

grammar wow;

program : expression | Comment ;
expression : expression '*' expression
           | NUMBER ;

Comment : '*' ~[\r\n]*;
NUMBER : [0-9]+ ;
Asterisk : '*' ;
Space : ' ' -> skip;

并使用测试文件：test.txt

5 * 5

Answer 1

使注释规则至少与另外一个非空格字符匹配，否则它可以与Asterisk规则匹配相同的内容，如下所示：

Comment: '*' ' '* ~[\r\n]+;

Answer 2

注释必须在行的开头吗？

如果是这样，您可以使用this._tokenStartCharPositionInLine == 0进行检查，并使用像这样的词法分析器规则 Comment : '*' ~[\r\n]* {this._tokenStartCharPositionInLine == 0}?;

如果没有，那么您应该收集有关先前标记的信息，这可能使我们具有乘法功能（例如您的NUMBER规则），因此您应该编写类似（java代码）

@lexer::members {
    private static final Set<Integer> MULTIPLIABLE_TOKENS = new HashSet<>();
    static {
        MULTIPLIABLE_TOKENS.add(NUMBER);
    }

    private boolean canBeMultiplied = false;

    @Override
    public void emit(final Token token) {
        final int type = token.getType();
        if (type != Whitespace && type != Newline) {  // skip ws tokens from consideration
            canBeMultiplied = MULTIPLIABLE_TOKENS.contains(type);
        }
        super.emit(token);
    }

}

Comment : {!canBeMultiplied}? '*' ~[\r\n]*;

更新

如果您需要JavaScript的函数类似物，请查看sources-> Lexer.js

如何修复ANTLR解析器以将注释与乘法分开？

2 个答案:

更新