我正在尝试开发一个正则表达式解析器,我无法弄清楚如何在递归下降解析器中处理锚字符($
)。
这是我的BNF:
* regex = "^" regex
| regex "$"
| term ( "|" term)*
* term = concat concat*
* concat = element [*]
* | element [+]
* | element [?]
* | element "{" int* "}"
* | element "{" int* "," "}"
* | element "{" int* "," int* "}"
* element = "(" regex")" | escaped_char | range | int | metacharacter | char
* ranges = "[" range* "]"
* range = char "-" char
* metacharacter = ...
* escaped_char = ...
* int = 0 .. 9
* char = ascii char
更具体地说,如何只用一个前瞻来处理$
?
最后一个concat
节点需要通过$
符号捕获,是否可以在递归下降解析器中处理它?或者我可能需要使用其他解析算法?
我想到的是这一点(如果有助于澄清我的问题,可以发表一些评论):
function parse_term token_list : (ast * token_list) =
next_token = lookahead token_list
if next_token is '^'
consume_tok ()
anchor_group = parse_concat token_list
if next_token is in follow_set(concat)
consume_tok ()
concat1 = parse_concat token_list
while next_token is in follow_set(concat)
consume_tok ()
concat1 = construct_concat (concat1, parse_concat token_list)
return construct_concat (anchor_group, concat1)
else
return anchor_group
else if next_token is in follow_set(concat):
concat1 = parse_concat token_list
while next_token is in follow_set(concat)
consume_tok ()
concat1 = construct_concat (concat1, parse_concat token_list)
------------------------------------------------
here I need to handle the $ metacharacter for the last concat node
but it has been handled in the while loop already.
----------------------------------------------------
....