how should i limit length of an ID token in ANTLR?

时间:2015-11-12 10:48:25

标签: java regex antlr4

This should be fairly simple. I'm working on a lexer grammar using ANTLR, and want to limit the maximum length of variable identifiers to 32 characters. I attempted to accomplish this with this line(following normal regex - syntax):

ID : ('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'){0,31};

No errors in code generation, but compilation failed due to a line in the generated code that was simply:

0,31

Obviously antlr is taking the section of text between the brackets and placing it in the accept state area along with the print line. I searched the ANTLR site, and I found no example or reference to an equivalent expression. What should the syntax of this expression be?

1 个答案:

答案 0 :(得分:2)

ANTLR4无法处理量词语法{a,b},而且,我不知道在词法分析器中设置此约束是否很好。我自己解释一下。您在词法分析器中添加的约束负责令牌识别。因此,如果您的字符串超过32个字符,则该标记将不会被识别为ID标记。这似乎不是那么好,因为它可能导致您的字符串被识别为另一个令牌,并可能导致解析阶段失败。

解决方案是避免这个长度约束并在Java ANTLR4 Listener or Visitor中处理它,例如,当长度大于32 char时抛出异常/显示错误...等。

修改>这个问题已在这里得到解答:Range quantifier syntax in ANTLR Regex