如何设计避免重复的上下文无关语法?

时间:2019-04-06 12:23:03

标签: grammar context-free-grammar repeat

我正在学习与上下文无关的语法,我想知道如何(如果有的话)设计一种避免重复的语言。

我们以SQL中的select语句为例:

possible: 
SELECT * FROM table
SELECT * FROM table WHERE x > 5
SELECT * FROM table WHERE x > 5 ORDER desc
SELECT * FROM table WHERE x > 5 ORDER desc LIMIT 5

impossible (multiple conflicting statements): 
SELECT * FROM table WHERE X > 5 WHERE X > 5

语法可能看起来像这样:

S -> SW | SO | SL | "SELECT statement"
W -> "WHERE statement"
O -> "ORDER statement" 
L -> "Limit statement"

这种语法将允许像上面提到的那样一个不可能的陈述。我如何设计一种上下文无关的语法,该语法避免了不可能的陈述,同时又保持灵活性?

灵活:

W,O,L的顺序无关紧要。这些子语句中有多少也无关紧要。我想避免只列出所有可能组合的语法,因为如果有很多可能性,这会变得很混乱。

2 个答案:

答案 0 :(得分:3)

在无上下文语法中,由非终结符生成的句子集对于非终结符的每次使用都是相同的。这就是上下文无关的意思。特定的非终结符S有时不能允许匹配,有时则不允许匹配。因此,每组可能的匹配项都必须具有自己的非终结符,并且在将k个案例的列表限制为没有重复案例的句子的情况下,至少需要2k个不同的非终结符,用于k案例的每个子集。

更糟糕的是,如果您要限制的重复具有无限数量的可能性(例如,您希望允许多个W子句,但不允许两个相同的W),则根本无法使用无上下文语法来完成。如果要坚持这样的重复,也是如此,这基本上是使上下文无关的语法坚持要在使用前声明变量的基本操作。

但是,在语义动作中进行检查很容易,例如,通过保留遇到的子句的位向量(如果不容易枚举可能的子句,则可以使用哈希集)。然后,用于将子句添加到语句的语义操作仅需要检查该特定子句是否已添加,并标记是否存在错误。这也将带来更好的错误消息,因为您可以在检测到问题时轻松地描述问题,而不是仅仅报告“语法”错误并让用户猜测问题出在哪里。

答案 1 :(得分:0)

I am not sure I am understanding your problem based on the grammar. Perhaps you mean for statement and S to be the same symbol. If that's the case, I would argue that your grammar is simply not right for the language you intend to describe. If we ignore ORDER and LIMIT then your grammar is

S -> SW | "SELECT S" | foo
W -> "WHERE S"

Then yes, you can derive nonsense like

S -> SW -> SWW -> SWWW -> "SELECT foo WHERE foo WHERE foo WHERE foo"

But this is just your first attempt at a grammar, this does not prove there is no grammar that works. Consider this:

<S> -> <A><B>
<A> -> SELECT <C>
<B> -> epsilon | WHERE <D>
<C> -> (rules for select lists)
<D> -> (rules for WHERE condition)

The rules for <C> and <D> can refer back to S and A and B, properly, perhaps using parentheses, as required to produce strings that work for you. No longer can you get the bad strings.

This is not really a problem that CFGs cannot overcome by themselves. To do things like enforce that only declared variables can be used, yes, context-sensitive or better machinery is needed, but we are just talking about repeating keywords and phrases. This is well within the bounds of what CFGs can do. Now, if you want to support aliases and enforce correct alias referencing in the query, that is impossible in context-free languages. But that's not what we're discussing here. The reason it's impossible is that the language L = {ww | w in E*} is not a context-free language, and that's essentially what is involved in enforcing variable names or table aliases.