以下是我试图为以下内容生成解析器的EBNF格式(主要是 - 实际语法记录here)语法:
expr = lambda_expr_list $;
lambda_expr_list = [ lambda_expr_list "," ] lambda_expr;
lambda_expr = conditional_expr [ "->" lambda_expr ];
conditional_expr = boolean_or_expr [ "if" conditional_expr "else" conditional_expr ];
boolean_or_expr = [ boolean_or_expr "or" ] boolean_xor_expr;
boolean_xor_expr = [ boolean_xor_expr "xor" ] boolean_and_expr;
boolean_and_expr = [ boolean_and_expr "and" ] boolean_not_expr;
boolean_not_expr = [ "not" ] relation;
relation = [ relation ( "=="
| "!="
| ">"
| "<="
| "<"
| ">="
| [ "not" ] "in"
| "is" [ "not" ] ) ] bitwise_or_expr;
bitwise_or_expr = [ bitwise_or_expr "|" ] bitwise_xor_expr;
bitwise_xor_expr = [ bitwise_xor_expr "^" ] bitwise_and_expr;
bitwise_and_expr = [ bitwise_and_expr "&" ] bitwise_shift_expr;
bitwise_shift_expr = [ bitwise_shift_expr ( "<<"
| ">>" ) ] subtraction_expr;
subtraction_expr = [ subtraction_expr "-" ] addition_expr;
addition_expr = [ addition_expr "+" ] division_expr;
division_expr = [ division_expr ( "/"
| "\\" ) ] multiplication_expr;
multiplication_expr = [ multiplication_expr ( "*"
| "%" ) ] negative_expr;
negative_expr = [ "-" ] positive_expr;
positive_expr = [ "+" ] bitwise_not_expr;
bitwise_not_expr = [ "~" ] power_expr;
power_expr = slice_expr [ "**" power_expr ];
slice_expr = member_access_expr { subscript };
subscript = "[" slice_defn_list "]";
slice_defn_list = [ slice_defn_list "," ] slice_defn;
slice_defn = lambda_expr
| [ lambda_expr ] ":" [ [ lambda_expr ] ":" [ lambda_expr ] ];
member_access_expr = [ member_access_expr "." ] function_call_expr;
function_call_expr = atom { parameter_list };
parameter_list = "(" [ lambda_expr_list ] ")";
atom = identifier
| scalar_literal
| nary_literal;
identifier = /[_A-Za-z][_A-Za-z0-9]*/;
scalar_literal = float_literal
| integer_literal
| boolean_literal;
float_literal = point_float_literal
| exponent_float_literal;
point_float_literal = /[0-9]+?\.[0-9]+|[0-9]+\./;
exponent_float_literal = /([0-9]+|[0-9]+?\.[0-9]+|[0-9]+\.)[eE][+-]?[0-9]+/;
integer_literal = dec_integer_literal
| oct_integer_literal
| hex_integer_literal
| bin_integer_literal;
dec_integer_literal = /[1-9][0-9]*|0+/;
oct_integer_literal = /0[oO][0-7]+/;
hex_integer_literal = /0[xX][0-9a-fA-F]+/;
bin_integer_literal = /0[bB][01]+/;
boolean_literal = "true"
| "false";
nary_literal = tuple_literal
| list_literal
| dict_literal
| string_literal
| byte_string_literal;
tuple_literal = "(" [ lambda_expr_list ] ")";
list_literal = "[" [ ( lambda_expr_list
| list_comprehension ) ] "]";
list_comprehension = lambda_expr "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];
dict_literal = "{" [ ( dict_element_list
| dict_comprehension ) ] "}";
dict_element_list = [ dict_element_list "," ] dict_element;
dict_element = lambda_expr ":" lambda_expr;
dict_comprehension = dict_element "for" lambda_expr_list "in" lambda_expr [ "if" lambda_expr ];
string_literal = /[uU]?[rR]?(\u0027(\\.|[^\\\r\n\u0027])*\u0027|\u0022(\\.|[^\\\r\n\u0022])*\u0022)/;
byte_string_literal = /[bB][rR]?(\u0027(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0026\u0028-\u005B\u005D-\u007F])*\u0027|\u0022(\\[\u0000-\u007F]|[\u0000-\u0009\u000B-\u000C\u000E-\u0021\u0023-\u005B\u005D-\u007F])*\u0022)/;
我用来生成解析器的工具是Grako,它生成一个声称支持直接和间接左递归的修改后的Packrat解析器。
当我在这个字符串上运行生成的解析器时:
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
我收到以下错误:
grako.exceptions.FailedParse: (1:13) Expecting end of text. :
input.filter(e -> e[0] in ['t', 'T']).map(e -> (e.len().str(), e)).map(e -> '(Line length: ' + e[0] + ') ' + e[1]).list()
^
expr
调试显示解析器似乎到达第一个e[0]
的末尾,然后从未回溯到/达到它将尝试匹配in
令牌的位置。
我的语法是否存在一些问题,以至于支持左递归的Packrat解析器会失败?或者我应该在Grako问题跟踪器上提交问题吗?
答案 0 :(得分:4)
它可能是语法中的错误,但错误消息并未告诉您实际发生的位置。完成语法后我总是做的是在其中嵌入 cut (~
)元素(在 if 之后的关键字之后,运算符,左括号,到处都是合理)。
cut 元素使Grako生成的解析器提交到在解析树中最接近的选项中采用的选项。这样,它不会在 if 处开始时解析器失败,而是会在它实际无法解析的表达式上报告失败。
语法中的一些错误很难被发现,为此我只需要通过解析跟踪来找出解析器输入的距离,以及为什么它决定不能进一步。
我不会在PEG解析器上使用左递归进行专业工作,尽管对于更简单的学术工作可能没问题。
boolean_or_expr = boolean_xor_expr {"or" boolean_xor_expr};
然后可以通过语义动作处理关联性。
另请参阅针对Grako的issue 49下的讨论。它说用于支持左递归的算法并不总是在结果AST中产生预期的关联性。