使用重复的BLOCK
输入,其中每个块都有重复的BEGIN EVENT
和END EVENT
项(END EVENT
总是跟随BEGIN EVENT
):
[TIMESTAMP] BLOCK
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
...
[TIMESTAMP] BLOCK
您如何用LR(1)消除此语法的歧义?我正在使用LALRPOP,它的最小示例是:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Block = BlockHeader (Begin End)+;
pub Blocks = Block*
因为LR(1)只能向前看一个记号,所以这种语法是模棱两可的,因为LALRPOP会告诉您(部分错误):
Local ambiguity detected
The problem arises after having observed the following symbols in the input:
BlockHeader (Begin End)+
At that point, if the next token is a `"[TIMESTAMP]"`, then the parser can proceed in two different ways.
First, the parser could execute the production at
/home/<snip>.lalrpop:51:9: 51:32, which would consume
the top 2 token(s) from the stack and produce a `Block`. This might then yield a parse tree like
BlockHeader (Begin End)+ Block
├─Block────────────────┤ │
├─Block+───────────────┘ │
└─Block+─────────────────────┘
Alternatively, the parser could shift the `"[TIMESTAMP]"` token and later use it to construct a
`Timestamp`. This might then yield a parse tree like
(Begin End)+ "[TIMESTAMP]" "BEGIN" "EVENT" End
│ ├─Timestamp─┘ │ │
│ └─Begin─────────────────────┘ │
└─(Begin End)+───────────────────────────────┘
我看到它告诉我,在解析BlockHeader之后,Begin和End无法确定下一个标记是另一个Begin还是另一个Block的开始。我还没有找到一种方法可以消除LR(1)中的歧义,但是我只能假定这是我缺乏理解,并且不是LR(1)语法的继承限制吗?
答案 0 :(得分:2)
不幸的是,如果不对语法进行完全重组,就很难解决这种“需要更多先行”问题,而这通常会丢失所需的输入结构,并且有时会接受原始语法会拒绝的简并输入。您通常可以拒绝这些输入,并通过对解析树进行后处理来恢复该结构,但这是更多的工作。就您而言,语法:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Event = Begin End;
Item = BlockHeader | Event;
pub Input = Item*
应该可以解决问题,但是存在一个问题,即它丢失了块结构(相反,它给了块头和事件一个非结构化的序列),并且它接受了空块。通过对项目列表进行后处理,您可以轻松解决这两个问题。
当所需的提前量很小且有界时,另一种选择是在令牌生成器中进行处理。我对LALRPOP并不熟悉,但是应该可以将[TIMESTAMP]
标记与紧随其后的关键字标记“结合”(因此时间戳不会出现在语法中,而只是关键字的一个属性) ),在这种情况下,只需进行一次令牌先行即可使一切正常。