我正在使用this ANTLR 3 grammar和ANTLRWorks来测试该语法。
但我无法弄清楚为什么输入文本的某些部分会被省略。
我想重写这个语法并显示AST / CST中源文件(输入)的每个元素(lparen,keywords,semicolon,..)。
我已经尝试了一切,但没有成功。对ANTLR有经验的人可以帮助我吗?
解析树:
答案 0 :(得分:1)
我设法将其缩小到semic
规则:
/*
This rule handles semicolons reported by the lexer and situations where the ECMA 3 specification states there should be semicolons automaticly inserted.
The auto semicolons are not actually inserted but this rule behaves as if they were.
In the following situations an ECMA 3 parser should auto insert absent but grammaticly required semicolons:
- the current token is a right brace
- the current token is the end of file (EOF) token
- there is at least one end of line (EOL) token between the current token and the previous token.
The RBRACE is handled by matching it but not consuming it.
The EOF needs no further handling because it is not consumed by default.
The EOL situation is handled by promoting the EOL or MultiLineComment with an EOL present from off channel to on channel
and thus making it parseable instead of handling it as white space. This promoting is done in the action promoteEOL.
*/
semic
@init
{
// Mark current position so we can unconsume a RBRACE.
int marker = input.mark();
// Promote EOL if appropriate
promoteEOL(retval);
}
: SEMIC
| EOF
| RBRACE { input.rewind(marker); }
| EOL | MultiLineComment // (with EOL in it)
;
因此, EVIL 分号插入再次出现!
我不太确定,但我认为这些mark
/ rewind
来电不同步。为分支选择和输入规则以执行实际匹配时,将执行@init
块。它实际上创造了很多标记,但没有清理它们。但是我不知道为什么它会像那样混淆解析树。
无论如何,这是同一规则的工作版本:
semic
@init
{
// Promote EOL if appropriate
promoteEOL(retval);
}
: SEMIC
| EOF
| { int pos = input.index(); } RBRACE { input.seek(pos); }
| EOL | MultiLineComment // (with EOL in it)
;
它更简单,不使用mark/rewind
机制。
但是有一个问题:如果在结束括号之前插入分号,则解析树中的semic
规则将具有子节点}
。尝试在i--
后删除分号并查看结果。您必须检测到这一点并在代码中处理它。 semic
应该包含;
令牌,或者包含EOL
(这意味着此时会以静默方式插入分号)。