为什么左递归,不确定或模棱两可的语法不能是LL(1)?

时间:2019-01-05 13:14:20

标签: parsing compiler-construction grammar ll language-theory

我从多个来源获悉LL(1)语法是:

  1. 毫不含糊
  2. 不是左递归
  3. 和确定性的(左因子分解)。

我无法完全理解的是,为什么以上对于任何LL(1)语法都是正确的。我知道LL(1)解析表在某些单元格中会有多个条目,但是我真正想要得到的是以下命题的正式且通用的证明(不带示例):

左递归(1),不确定(2)或模棱两可(3)的语法不是LL(1)。

2 个答案:

答案 0 :(得分:1)

我做了更多的研究,我想我已经找到了第一个和第二个问题的解决方案,至于第三个,我在这里找到了一个现有的解决方案,下面的证明尝试如下:

我们首先编写LL(1)语法定义的三个规则:

对于每个带有A -> α | β的制作α ≠ β

  1. FIRST(α) ∩ FIRST(β) = Ø
  2. 如果是β =>* ε,则为FIRST(α) ∩ FOLLOW(A) = Ø(如果是α =>* ε,则是FIRST(β) ∩ FOLLOW(A) = Ø)。
  3. 在规则(1)中包含ε意味着αβ中的一个最多可以派生ε

命题1: 非因式语法不是LL(1)。

证明:

如果语法G是非因式的,则G中会存在以下形式的产生式:

A -> ωα1 | ωα2 | ... | ωαn

(其中αii-th α,而不是符号αi),带有α1 ≠ α2 ≠ ... ≠ αn。然后,我们可以轻松地表明这一点:

∩(i=1,..,n) FIRST(ωαi) ≠ Ø

与定义的规则(1)相矛盾,因此非因式语法不是LL(1)。 ∎

命题2: 左递归语法不是LL(1)。

证明:

如果语法是左递归的,则在G中存在形式为:

S -> Sα | β

这里出现三种情况:

  1. 如果FIRST(β) ≠ {ε},则:

        FIRST(β) ⊆ FIRST(S)

    =>  FIRST(β) ∩ FIRST(Sα) ≠ Ø

    与定义的规则(1)相矛盾。

  2. 如果FIRST(β) = {ε},则:

    2.1。如果为ε ∈ FIRST(α),则:

    ε ∈ FIRST(Sα)

    与定义的规则(3)相矛盾。

    2.2。如果为ε ∉ FIRST(α),则:

        FIRST(α) ⊆ FIRST(S)(因为β =>* ε

    =>  FIRST(α) ⊆ FIRST(Sα) ........ (I)

    我们也知道:

    FIRST(α) ⊆ FOLLOW(S) ........ (II)

    通过(I)(II),我们有:

    FIRST(Sα) ∩ FOLLOW(S) ≠ Ø

    并且自β =>* ε起,这与定义的规则(2)相矛盾。

在每种情况下我们都存在矛盾,因此,左递归语法不是LL(1)。 ∎

命题3: 模棱两可的语法不是LL(1)。

证明:

虽然上面的证明是我的,但不是,这是我从他下面的答案中得到的Kevin A. Naudé链接到的:

https://stackoverflow.com/a/18969767/6275103

答案 1 :(得分:0)

The answer to these questions (and they are valid for LL(k) for any finite k) have to do with how the parsing stack works in an LL parser.

At the point where one is at the beginning of a non-terminal in a grammar, the parser must determine by looking ahead only k (1 in the LL(1)) case tokens before deciding whether to push onto the stack a specific rule or to parse the text using other rules. So, let’s look at each of these cases and see how it impacts that decision.

  1. Left-recursive. There are two left-recursive cases.

    a. The left-recursion has no tokens in it after the recursion. A rule something like:

nonterm: nonterm;

Such a rule has no effect and no matter how much you recurse doesn’t change what you are parsing.

b. The left-recursion has tokens in it after the recursion.  A rules something like:

nonterm: nonterm “X”;

In this rule, you need to push nonterm rules onto the stack for as many Xs as follow the nonterm. You cannot determine how many Xs there are with only k tokens of lookahead. If you guess and guess too small, you end up with Xs left over, and for any guess, there will be a case in the language with more than that many X tokens. If you guess and you guess too large, you end up with extern nonterm rules on the stack. You don’t get to remove them. In either case, you are simply wrong.

  1. Non-deterministic. A non-deterministic grammar has the same characteristics as a left-recursive one. It is non-deterministic whether you should push or not. Palindrome languages are typical non-deterministic examples, but not the only ones. In a palindrome language, you don’t know whether you should push another nonterminal onto the stack or use the token you are seeing to help you pop your way back up the stack. If you make the wrong choice, you again misparse the input.

  2. Ambiguous. Again the problem is similar. In this case, there are two possible parses. One which pushes one nonterminal and successfully parses the input and another parse which doesn’t (possibly pushing another non-terminal instead, either now or later in the parse). Either one will yield a correct parse. Now, in the ambiguous case, pushing the nonterminal will not necessarily cause a parsing error, you will simply choose one of the potential parses while ignoring the other. If you semantics require that the other parse be chosen, the problem will rear its head later. Note, of course, the most ambiguous grammars are also non-deterministic.

Now, if you look at those cases, you can see, that if you could somehow both push and not push the nonterminal onto the stack, you could parse the input with the grammar. And, in the ambiguous case, produce a set of parses that matched the input. There are techniques that do that, I believe they are considered GLL (generalized LL) — the equivalent technique with an LR parser generator is called GLR. The resulting output is often considered a “parse forest” (or sometimes a parse dag, directed acyclic graph).

[Note: I saw the above question first on Quora and this answer is copied from there.]