Question

我试图在词法分析器中使用语义谓词来向前看一个标记，但不知怎的，我无法做到正确。这就是我所拥有的：

lexer语法

lexer grammar TLLexer;

DirStart
    : { getCharPositionInLine() == 0 }? '#dir'
    ;

DirEnd
    : { getCharPositionInLine() == 0 }? '#end'
    ;

Cont
    : 'contents' [ \t]* -> mode(CNT)
    ;

WS
    : [ \t]+ -> channel(HIDDEN)
    ;

NL
    : '\r'? '\n'
    ;

mode CNT;

CNT_DirEnd
    : '#end' [ \t]* '\n'?
      { System.out.println("--matched end--"); }
    ;

CNT_LastLine
    : ~ '\n'* '\n'
      { _input.LA(1) == CNT_DirEnd }? -> mode(DEFAULT_MODE)
    ;

CNT_Line
    : ~ '\n'* '\n'
    ;

解析器语法

parser grammar TLParser;

options { tokenVocab = TLLexer; }

dirs
    : ( dir
      | NL
      )*
    ;

dir
    : DirStart Cont 
      contents
      DirEnd
    ;

contents
    : CNT_Line* CNT_LastLine
    ;

基本上，CNT模式中的每一行都是自由格式的，但它从不以#end开头，后跟可选的空格。基本上我想在默认词法模式下保持匹配 #end 标签。

我的测试输入如下：

#dir contents
 ..line..
#end

如果我在grun中运行，我会得到以下内容

$ grun TL dirs test.txt 
--matched end--
line 3:0 extraneous input '#end\n' expecting {CNT_LastLine, CNT_Line}

很明显CNT_DirEnd匹配，但不知何故谓词没有检测到它。

我知道这个特殊的任务并不需要语义谓词，但这只是不起作用的部分。如果我只是将 #end 标记的匹配移动到模式CNT中，那么实际的解析器虽然可以在没有谓词的情况下编写，但是它会更加干净。

谢谢，
克莎。

Answer 1

我想我明白了。成员 _input 表示原始输入的字符，因此 _input.LA 会返回字符，而不是词法分析器ID（这是正确的术语吗？）。无论哪种方式，词法分析器返回给解析器的数字与_input.LA返回的值无关，因此谓词失败，除非通过一些奇怪的运气 _input.LA（1）返回的字符值等于 CNT_DirEnd 的词法分析器ID。

我修改了词法分析器，如下所示，现在它可以工作，即使它没有我希望的那么优雅（也许有人知道更好的方法？）

lexer grammar TLLexer;

@lexer::members {
    private static final String END_DIR = "#end";

    private boolean isAtEndDir() {
        StringBuilder sb = new StringBuilder();

        int n = 1;
        int ic;

        // read characters until EOF
        while ((ic = _input.LA(n++)) != -1) {
            char c = (char) ic;
            // we're interested in the next line only
            if (c == '\n') break;
            if (c == '\r') continue;
            sb.append(c);
        }

        // Does the line begin with #end ?
        if (sb.indexOf(END_DIR) != 0) return false;
        // Is the #end followed by whitespace only?
        for (int i = END_DIR.length(); i < sb.length(); i++) {
            switch (sb.charAt(i)) {
            case ' ':
            case '\t':
                continue;
            default: return false;
            }
        }
        return true;
    }
}

[skipped .. nothing changed in the default mode]

mode CNT;

/* removed CNT_DirEnd */

CNT_LastLine
    : ~ '\n'* '\n'
      { isAtEndDir() }? -> mode(DEFAULT_MODE)
    ;

CNT_Line
    : ~ '\n'* '\n'
    ;

ANRLR4词法分析器语义谓词问题

1 个答案: