我一直在尝试将我在网上找到的语法转换为antlr4格式。原语法在这里:https://github.com/cv/asp-parser/blob/master/vbscript.bnf。
简短版本: 我认为当前遇到的问题是由于lexing阶段的模糊性。
例如,我复制了浮点文字的规则,如下所示:
float_literal : DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
在文件中我有一个字母定义:
LETTER: 'a'..'z';
似乎因为我在浮点字面值中使用'e',该字符无法识别为字母?在我的研究中,我遇到了为每个字母都有一个代币的想法,所以信就会变成:
letter: A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z;
我会用E替换'e'的任何实例。但是这个文件中有更长的字符串,例如'.and'。所以这种方法需要用DOT A N D替换那样的东西吗?这根本不对。
我做了一些根本错误的事情,还是我可以采取一些措施来避免这种歧义?
谢谢, 克雷格
完整的语法如下。
grammar vbscript;
/*===== Character Sets =====*/
SPACES: ' ' -> skip;
DIGIT: '0'..'9';
SEMI_COLON: ':';
NEW_LINE_CHARACTER: [\r\n]+;
WHITESPACE_CHARACTER: [ \t];
LETTER: 'a'..'z';
QUOTE: '"';
HASH: '#';
SQUARE_BRACE: '[' | ']';
PLUS_OR_MINUS: [+-];
ANYTHING_ELSE: ~('"' | '#');
ws: WHITESPACE_CHARACTER;
id_tail: (DIGIT | LETTER | '_');
string_character: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS | SQUARE_BRACE;
id_name_char: ANYTHING_ELSE | DIGIT | WHITESPACE_CHARACTER | SEMI_COLON | LETTER | PLUS_OR_MINUS;
/*===== terminals =====*/
whitespace: ws+ | '_' ws* new_line?;
comment_line : '' | 'rem';
string_literal : '"' ( string_character | '""' )* '"';
float_literal : DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
id : LETTER id_tail*
| '[' id_name_char* ']';
iddot : LETTER id_tail* '.'
| '[' id_name_char* ']' '.'
| 'and.'
| 'byref.'
| 'byval.'
| 'call.'
| 'case.'
| 'class.'
| 'const.'
| 'default.'
| 'dim.'
| 'do.'
| 'each.'
| 'else.'
| 'elseif.'
| 'empty.'
| 'end.'
| 'eqv.'
| 'erase.'
| 'error.'
| 'exit.'
| 'explicit.'
| 'false.'
| 'for.'
| 'function.'
| 'get.'
| 'goto.'
| 'if.'
| 'imp.'
| 'in.'
| 'is.'
| 'let.'
| 'loop.'
| 'mod.'
| 'new.'
| 'next.'
| 'not.'
| 'nothing.'
| 'null.'
| 'on.'
| 'option.'
| 'or.'
| 'preserve.'
| 'private.'
| 'property.'
| 'public.'
| 'redim.'
| 'rem.'
| 'resume.'
| 'select.'
| 'set.'
| 'step.'
| 'sub.'
| 'then.'
| 'to.'
| 'true.'
| 'until.'
| 'wend.'
| 'while.'
| 'with.'
| 'xor.';
dot_id : '.' LETTER id_tail*
| '.' '[' id_name_char* ']'
| '.and'
| '.byref'
| '.byval'
| '.call'
| '.case'
| '.class'
| '.const'
| '.default'
| '.dim'
| '.do'
| '.each'
| '.else'
| '.elseif'
| '.empty'
| '.end'
| '.eqv'
| '.erase'
| '.error'
| '.exit'
| '.explicit'
| '.false'
| '.for'
| '.function'
| '.get'
| '.goto'
| '.if'
| '.imp'
| '.in'
| '.is'
| '.let'
| '.loop'
| '.mod'
| '.new'
| '.next'
| '.not'
| '.nothing'
| '.null'
| '.on'
| '.option'
| '.or'
| '.preserve'
| '.private'
| '.property'
| '.public'
| '.redim'
| '.rem'
| '.resume'
| '.select'
| '.set'
| '.step'
| '.sub'
| '.then'
| '.to'
| '.true'
| '.until'
| '.wend'
| '.while'
| '.with'
| '.xor';
dot_iddot : '.' LETTER id_tail* '.'
| '.' '[' id_name_char* ']' '.'
| '.and.'
| '.byref.'
| '.byval.'
| '.call.'
| '.case.'
| '.class.'
| '.const.'
| '.default.'
| '.dim.'
| '.do.'
| '.each.'
| '.else.'
| '.elseif.'
| '.empty.'
| '.end.'
| '.eqv.'
| '.erase.'
| '.error.'
| '.exit.'
| '.explicit.'
| '.false.'
| '.for.'
| '.function.'
| '.get.'
| '.goto.'
| '.if.'
| '.imp.'
| '.in.'
| '.is.'
| '.let.'
| '.loop.'
| '.mod.'
| '.new.'
| '.next.'
| '.not.'
| '.nothing.'
| '.null.'
| '.on.'
| '.option.'
| '.or.'
| '.preserve.'
| '.private.'
| '.property.'
| '.public.'
| '.redim.'
| '.rem.'
| '.resume.'
| '.select.'
| '.set.'
| '.step.'
| '.sub.'
| '.then.'
| '.to.'
| '.true.'
| '.until.'
| '.wend.'
| '.while.'
| '.with.'
| '.xor.';
/*===== rules =====*/
new_line: (SEMI_COLON | NEW_LINE_CHARACTER)+;
program: new_line? global_stmt_list;
/*===== rules: declarations =====*/
class_decl: 'class' extended_id new_line member_decl_list 'end' 'class' new_line;
member_decl_list: member_decl*;
member_decl: field_decl | var_decl | const_decl | sub_decl | function_decl | property_decl;
field_decl:
'private' field_name other_vars_opt new_line
| 'public' field_name other_vars_opt new_line;
field_name: field_id '(' array_rank_list ')' | field_id;
field_id: id | 'default' | 'erase' | 'error' | 'explicit' | 'step';
var_decl: 'dim' var_name other_vars_opt new_line;
var_name: extended_id '(' array_rank_list ')' | extended_id;
other_vars_opt: (',' var_name other_vars_opt)?;
array_rank_list: (int_literal ',' array_rank_list | int_literal)?;
const_decl: access_modifier_opt 'const' const_list new_line;
const_list: extended_id '=' const_expr_def ',' const_list | extended_id '=' const_expr_def;
const_expr_def: '(' const_expr_def ')'
| '-' const_expr_def
| '+' const_expr_def
| const_expr;
sub_decl:
method_access_opt 'sub' extended_id method_arg_list new_line method_stmt_list 'end' 'sub' new_line
| method_access_opt 'sub' extended_id method_arg_list inline_stmt 'end' 'sub' new_line;
function_decl:
method_access_opt 'function' extended_id method_arg_list new_line method_stmt_list 'end' 'function' new_line
| method_access_opt 'function' extended_id method_arg_list inline_stmt 'end' 'function' new_line;
method_access_opt: 'public' 'default' | access_modifier_opt;
access_modifier_opt: ('public' | 'private')?;
method_arg_list: ('(' arg_list? ')')?;
arg_list: arg (',' arg_list)?;
arg: arg_modifier_opt extended_id ('(' ')')?;
arg_modifier_opt: ('byval' | 'byref')?;
property_decl: method_access_opt 'property' property_access_type extended_id method_arg_list new_line method_stmt_list 'end' 'property' new_line;
property_access_type: 'get' | 'let' | 'set';
/*===== rules: statements =====*/
global_stmt: option_explicit | class_decl | field_decl | const_decl | sub_decl | function_decl | block_stmt;
method_stmt: const_decl | block_stmt;
block_stmt:
var_decl
| redim_stmt
| if_stmt
| with_stmt
| select_stmt
| loop_stmt
| for_stmt
| inline_stmt new_line;
inline_stmt:
assign_stmt
| call_stmt
| sub_call_stmt
| error_stmt
| exit_stmt
| 'erase' extended_id;
global_stmt_list: global_stmt_list global_stmt | global_stmt;
method_stmt_list: method_stmt*;
block_stmt_list: block_stmt*;
option_explicit: 'option' 'explicit' new_line;
error_stmt: 'on' 'error' 'resume' 'next' | 'on' 'error' 'goto' int_literal;
exit_stmt: 'exit' 'do' | 'exit' 'for' | 'exit' 'function' | 'exit' 'property' | 'exit' 'sub';
assign_stmt:
left_expr '=' expr
| 'set' left_expr '=' expr
| 'set' left_expr '=' 'new' left_expr;
sub_call_stmt: qualified_id sub_safe_expr? comma_expr_list
| qualified_id sub_safe_expr?
| qualified_id '(' expr ')' comma_expr_list
| qualified_id '(' expr ')'
| qualified_id '(' ')'
| qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr? comma_expr_list
| qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr? comma_expr_list
| qualified_id index_or_params_list '.' left_expr_tail sub_safe_expr?
| qualified_id index_or_params_list_dot left_expr_tail sub_safe_expr?;
call_stmt: 'call' left_expr;
left_expr: qualified_id index_or_params_list '.' left_expr_tail
| qualified_id index_or_params_list_dot left_expr_tail
| qualified_id index_or_params_list
| qualified_id
| safe_keyword_id;
left_expr_tail: qualified_id_tail index_or_params_list '.' left_expr_tail
| qualified_id_tail index_or_params_list_dot left_expr_tail
| qualified_id_tail index_or_params_list
| qualified_id_tail;
qualified_id: iddot qualified_id_tail
| dot_iddot qualified_id_tail
| id
| dot_id;
qualified_id_tail: iddot qualified_id_tail
| id
| keyword_id;
keyword_id: safe_keyword_id
| 'and'
| 'byref'
| 'byval'
| 'call'
| 'case'
| 'class'
| 'const'
| 'dim'
| 'do'
| 'each'
| 'else'
| 'elseif'
| 'empty'
| 'end'
| 'eqv'
| 'exit'
| 'false'
| 'for'
| 'function'
| 'get'
| 'goto'
| 'if'
| 'imp'
| 'in'
| 'is'
| 'let'
| 'loop'
| 'mod'
| 'new'
| 'next'
| 'not'
| 'nothing'
| 'null'
| 'on'
| 'option'
| 'or'
| 'preserve'
| 'private'
| 'public'
| 'redim'
| 'resume'
| 'select'
| 'set'
| 'sub'
| 'then'
| 'to'
| 'true'
| 'until'
| 'wend'
| 'while'
| 'with'
| 'xor';
safe_keyword_id: 'default'
| 'erase'
| 'error'
| 'explicit'
| 'property'
| 'step';
extended_id: safe_keyword_id
| id;
index_or_params_list: index_or_params index_or_params_list
| index_or_params;
index_or_params: '(' expr comma_expr_list ')'
| '(' comma_expr_list ')'
| '(' expr ')'
| '(' ')';
index_or_params_list_dot: index_or_params index_or_params_list_dot
| index_or_params_dot;
index_or_params_dot: '(' expr comma_expr_list ').'
| '(' comma_expr_list ').'
| '(' expr ').'
| '(' ').';
comma_expr_list: ',' expr comma_expr_list
| ',' comma_expr_list
| ',' expr
| ',';
/* redim statement */
redim_stmt: 'redim' redim_decl_list new_line
| 'redim' 'preserve' redim_decl_list new_line;
redim_decl_list: redim_decl ',' redim_decl_list
| redim_decl;
redim_decl: extended_id '(' expr_list ')';
/* if statement */
if_stmt: 'if' expr 'then' new_line block_stmt_list else_stmt_list 'end' 'if' new_line
| 'if' expr 'then' inline_stmt else_opt end_if_opt new_line;
else_stmt_list: ('elseif' expr 'then' new_line block_stmt_list else_stmt_list
| 'elseif' expr 'then' inline_stmt new_line else_stmt_list
| 'else' inline_stmt new_line
| 'else' new_line block_stmt_list)?;
else_opt: ('else' inline_stmt)?;
end_if_opt : ('end' 'if')?;
/* with statement */
with_stmt: 'with' expr new_line block_stmt_list 'end' 'with' new_line;
/* loop statement */
loop_stmt: 'do' loop_type expr new_line block_stmt_list 'loop' new_line
| 'do' new_line block_stmt_list 'loop' loop_type expr new_line
| 'do' new_line block_stmt_list 'loop' new_line
| 'while' expr new_line block_stmt_list 'wend' new_line;
loop_type: 'while' | 'until';
/* for statement */
for_stmt: 'for' extended_id '=' expr 'to' expr step_opt new_line block_stmt_list 'next' new_line
| 'for' 'each' extended_id 'in' expr new_line block_stmt_list 'next' new_line;
step_opt: ('step' expr)?;
/* select statement */
select_stmt: 'select' 'case' expr new_line cast_stmt_list 'end' 'select' new_line;
cast_stmt_list: ('case' expr_list nl_opt block_stmt_list cast_stmt_list
| 'case' 'else' nl_opt block_stmt_list)?;
nl_opt: new_line?;
expr_list: expr ',' expr_list | expr;
/*===== rules: expressions =====*/
sub_safe_expr: sub_safe_imp_expr;
sub_safe_imp_expr: sub_safe_imp_expr 'imp' eqv_expr | sub_safe_eqv_expr;
sub_safe_eqv_expr: sub_safe_eqv_expr 'eqv' xor_expr
| sub_safe_xor_expr;
sub_safe_xor_expr: sub_safe_xor_expr 'xor' or_expr
| sub_safe_or_expr;
sub_safe_or_expr: sub_safe_or_expr 'or' and_expr
| sub_safe_and_expr;
sub_safe_and_expr : sub_safe_and_expr 'and' not_expr
| sub_safe_not_expr;
sub_safe_not_expr : 'not' not_expr
| sub_safe_compare_expr;
sub_safe_compare_expr : sub_safe_compare_expr 'is' concat_expr
| sub_safe_compare_expr 'is' 'not' concat_expr
| sub_safe_compare_expr '>=' concat_expr
| sub_safe_compare_expr '=>' concat_expr
| sub_safe_compare_expr '<=' concat_expr
| sub_safe_compare_expr '=<' concat_expr
| sub_safe_compare_expr '>' concat_expr
| sub_safe_compare_expr '<' concat_expr
| sub_safe_compare_expr '<>' concat_expr
| sub_safe_compare_expr '=' concat_expr
| sub_safe_concat_expr;
sub_safe_concat_expr : sub_safe_concat_expr '&' add_expr
| sub_safe_add_expr;
sub_safe_add_expr : sub_safe_add_expr '+' mod_expr
| sub_safe_add_expr '-' mod_expr
| sub_safe_mod_expr;
sub_safe_mod_expr : sub_safe_mod_expr 'mod' int_div_expr
| sub_safe_int_div_expr;
sub_safe_int_div_expr : sub_safe_int_div_expr '\\' mult_expr
| sub_safe_mult_expr;
sub_safe_mult_expr : sub_safe_mult_expr '*' unary_expr
| sub_safe_mult_expr '/' unary_expr
| sub_safe_unary_expr;
sub_safe_unary_expr : '-' unary_expr
| '+' unary_expr
| sub_safe_exp_expr;
sub_safe_exp_expr : sub_safe_value '^' exp_expr
| sub_safe_value;
sub_safe_value : const_expr
| left_expr
| '(' expr ')';
expr : imp_expr;
imp_expr : imp_expr 'imp' eqv_expr
| eqv_expr;
eqv_expr : eqv_expr 'eqv' xor_expr
| xor_expr;
xor_expr : xor_expr 'xor' or_expr
| or_expr;
or_expr : or_expr 'or' and_expr
| and_expr;
and_expr : and_expr 'and' not_expr
| not_expr;
not_expr : 'not' not_expr
| compare_expr;
compare_expr : compare_expr 'is' concat_expr
| compare_expr 'is' 'not' concat_expr
| compare_expr '>=' concat_expr
| compare_expr '=>' concat_expr
| compare_expr '<=' concat_expr
| compare_expr '=<' concat_expr
| compare_expr '>' concat_expr
| compare_expr '<' concat_expr
| compare_expr '<>' concat_expr
| compare_expr '=' concat_expr
| concat_expr;
concat_expr : concat_expr '&' add_expr
| add_expr;
add_expr : add_expr '+' mod_expr
| add_expr '-' mod_expr
| mod_expr;
mod_expr : mod_expr 'mod' int_div_expr
| int_div_expr;
int_div_expr : int_div_expr '\\' mult_expr
| mult_expr;
mult_expr : mult_expr '*' unary_expr
| mult_expr '/' unary_expr
| unary_expr;
unary_expr : '-' unary_expr
| '+' unary_expr
| exp_expr;
exp_expr : value '^' exp_expr
| value;
value : const_expr
| left_expr
| '(' expr ')';
const_expr : bool_literal
| int_literal
| float_literal
| string_literal
| nothing;
bool_literal : 'true'
| 'false';
int_literal : DIGIT+;
nothing : 'nothing'
| 'null'
| 'empty';
答案 0 :(得分:0)
你的语法定义了&#34;文字&#34;在解析器部分。请注意,ANTLR将每个小写规则视为解析器规则(大写规则是词法规则)。
您的小问题部分可能会像这样解决:
FLOAT_LITERAL
: DIGIT* '.' DIGIT+ ( 'e' PLUS_OR_MINUS? DIGIT+ )?
| DIGIT+ 'e' PLUS_OR_MINUS? DIGIT+;
LETTER
: [a-z];
ANTLR词法分析器更喜欢最长匹配规则(如果两个规则冲突,则优先选择第一个规则)。这两个规则都是完全不相关的,因此定义的顺序是不相关的(它更可读,以定义更复杂的规则高于基本规则)。
您可以使用大写字符扩展第二个定义:
LETTER
: [a-zA-Z];
要解决语法的整体问题,您需要完全重写语法。 terminals
部分的大多数规则应该是词法规则。但是终端部分似乎过于填充,因此某些规则也可能是不存在的解析器规则的变通方法。