我正在阅读this introductory book解析(这是非常好的顺便说一下),其中一个练习就是“为你最喜欢的语言构建一个解析器”。因为我今天不想死,所以我认为我可以做一个相对简单的解析器,即简化的CSS。
注意:本书教您如何使用递归下降算法编写LL(1)解析器。
所以,作为一个子练习,我正在从我所知道的CSS中构建语法。但我坚持使用LL(1)无法改造的作品:
//EBNF
block = "{", declaration, {";", declaration}, [";"], "}"
//BNF
<block> =:: "{" <declaration> "}"
<declaration> =:: <single-declaration> <opt-end> | <single-declaration> ";" <declaration>
<opt-end> =:: "" | ";"
这描述了一个CSS块。有效块可以采用以下形式:
{ property : value }
{ property : value; }
{ property : value; property : value }
{ property : value; property : value; }
...
问题在于可选的“;”最后,因为它与{“;”,declaration}的起始字符重叠,所以当我的解析器在这个上下文中遇到分号时,它不知道该怎么做。
这本书讨论了这个问题,但在它的例子中,分号是强制性的,所以规则可以像这样修改:
block = "{", declaration, ";", {declaration, ";"}, "}"
那么,是否有可能使用LL(1)解析器实现我想要做的事情?
答案 0 :(得分:1)
如果允许空声明,则可以消除歧义:
//EBNF block = "{", declaration, {";", declaration}, "}" declaration = "" | ... //BNF <block> =:: "{" <declaration-list> "}" <declaration-list> =:: <declaration> | <declaration> ";" <declaration-list> <declaration> =:: "" | ...
虽然这不是一项要求:
// these don't allow empty declarations block = "{", {declaration, ";"}, end-declaration, "}" end-declaration = "" | declaration <block> =:: "{" <declaration-list> "}" <declaration-list> =:: "" | <declaration> | <declaration> ";" <declaration-list>
要处理空值,请找出空值后可能出现的终端,并让解析器识别出这些终结点。对于递归下降(在非完全java中):
/*
<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: <declaration> | <declaration> ";" <declaration-list>
<declaration> =:: "" | ...
*/
Node block(tokens) {
terminal(BLOCKBEGIN, tokens);
Node decls = declList(tokens);
terminal(BLOCKEND, tokens);
}
Node declList(tokens) {
Node dcl = decl(tokens);
if (expect(LINESEP, tokens)) {
terminal(LINESEP, tokens);
return new DeclarationList(dcl, declList(tokens));
} else {
return new DeclarationList(dcl);
}
}
Node decl(tokens) {
if (expect(BLOCKEND, tokens)) {
return new Declaration();
}
...
}
/*
<block> =:: "{" <declaration-list> "}"
<declaration-list> =:: "" | <declaration> | <declaration> ";" <declaration-list>
*/
Node block(tokens) {
terminal(BLOCKBEGIN, tokens);
Node decls = declList(tokens);
terminal(BLOCKEND, tokens);
}
Node declList(tokens) {
if (expect(BLOCKEND, tokens)) {
return new DeclarationList();
}
Node dcl = decl(tokens);
if (expect(LINESEP, tokens)) {
terminal(LINESEP, tokens);
return new DeclarationList(dcl, declList(tokens));
} else {
return new DeclarationList(dcl);
}
}
对于自上而下的解析器,该过程更加明确。构造FOLLOWS
关系时,使用生成null的非终结符之后的任何内容递归替换空值。
A → B C B → b C → D E D → d | "" E → e FOLLOWS(B) ← FIRST(C) = FIRST(D) = {d, ""} += - {""} + FOLLOWS(D) = - {""} + FIRST(E) = - {""} + {e} FOLLOWS(B) = {d, e}
然后正常填写解析表。
答案 1 :(得分:1)
我想我明白了:
//EBNF
block = "{", decl, "}"
decl = simple-decl, [";", [decl]]
simple-decl = ...
//BNF
<block> =:: "{" <decl> "}"
<decl> =:: <simple-decl> <decl-end>
<decl-end> =:: ";" <decl-tail> | ""
<decl-tail> =:: <decl> | e
这产生以下代码:
private function parseBlock():void {
accept(Token.LBRACE);
parseDecl();
accept(Token.RBRACE);
}
//Token.IDENTIFIER is the starting token of a declaration
private function parseDecl():void {
accept(Token.IDENTIFIER);
if(_currentToken.kind == Token.SEMICOLON){
accept(Token.SEMICOLON);
if(_currentToken.kind == Token.IDENTIFIER){
parseDecl();
}
}
}
我是对的?