我正在尝试为PostgreSQL SQL定义词法分析器规则。
问题在于操作员定义和行注释相互冲突。
例如@---
是一个操作符@-
,后跟--
注释,而不是操作符@---
在grako
中,可以为-
片段定义负前瞻,如:
OP_MINUS: '-' ! ( '-' ) .
在ANTLR4中,我找不到任何方法来回滚已经消耗的片段。
有什么想法吗?
这里是PostgreSQL运算符的原始定义:
The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:
+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.
A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:
~ ! @ # % ^ & | ` ?
For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
答案 0 :(得分:2)
您可以在词法规则中使用语义谓词来执行超前(或后面)而不消耗字符。例如,以下内容涵盖了运营商的若干规则。
OPERATOR
: ( [+*<>=~!@#%^&|`?]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)+
;
但是,上述规则并未解决在运营商末尾加入+
或-
的限制问题。为了尽可能简单地处理这个问题,我可能会将这两个案例分成不同的规则。
// this rule does not allow + or - at the end of a rule
OPERATOR
: ( [*<>=~!@#%^&|`?]
| ( '+'
| '-' {_input.LA(1) != '-'}?
)+
[*<>=~!@#%^&|`?]
| '/' {_input.LA(1) != '*'}?
)+
;
// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
: ( [*<>=+]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)*
[~!@#%^&|`?]
OPERATOR?
( '+'
| '-' {_input.LA(1) != '-'}?
)+
-> type(OPERATOR)
;