语法模糊或不模糊?

时间:2015-04-09 01:03:10

标签: grammar bnf

我是BNF的新手,我还有一个需要解决的教程问题。以下是问题。

'对于以下每个语法,请指明它们是否含糊不清'

Grammar1:

<T> ::= <T> <Q> 0 | 22
<Q> ::= 2|3 

Grammar2:

<first>::=<first><first><second>
<second>::=<third><second>
<third>::=a|b|c

Grammar3:

<A>::=<B><A><A>
<B>::=1|2|3|4

有人可以帮助我找到答案并以一种容易理解的方式描述这是一个很好的帮助。所以请。

1 个答案:

答案 0 :(得分:1)

要检测语法中的歧义,您需要展示一个可以通过两种方式解析的字符串。

查找这样的字符串可能很难用大的语法;事实上,这可能是不可能的。

但是你通过探索各种令牌序列手动完成这项工作。如果语法不是微不足道的话,这会变得很快,并且在实践中不起作用。

您真正想要做的是构建一个工具,枚举可能的字符串并尝试查看是否存在歧义。

你可以通过简单地生成所有字符串来实现这种蛮力,但这会迅速产生许多字符串,这些字符串根本无法解析,并且没有帮助。

或者,您可以使用语法作为指导生成字符串,确保建议字符串的每个扩展都产生语法仍然可以接受的内容。这样所有生成的字符串都是有效的,所以至少你生成了有效的垃圾。

您可以跨语法规则进行深度优先搜索。您最终机械化了以下过程:

 1.  Pick a pair of rules with the same LHS.
 2.  Instantiate  S1 with the RHS of the first rule, S2 with the RHS of the second.
 3.  Repeat until you are tired (hit some search depth):
     a. if s1 == s2, you've found an ambiguity.
     b. if s1 derives a terminal that s2 does not derive,
        then s1 and s2 cannot be ambiguous.
     c. Pick a nonterminal in s1 or s2.
        If there is none, then if s1 <> s2, this path doesn't lead to an ambiguity: backtrack.
     d. Replace the nonterminal with a valid RHS for that nonterminal.
     e. Recurse to a.
 4.  If all branches of the search lead to non-ambiguous strings,
     then this rule isn't ambiguous.

DMS Software Reengineering Toolkit有一个内置此功能的解析器生成器;我们可以简单地尝试语法。我不得不稍微重新编写语法以使它们与DMS兼容,所以我在这里展示了新版本:

<强> Grammar1:

<T> ::= <T><Q> '0' ;
<T> ::= '2' '2' ;
<Q> ::= '2' ;
<Q> ::= '3' ;

DMS在Grammar1上运行:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar1.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 6 NRules = 4
*** LR(0) State Machine construction complete ***
States: 8
What next? ambiguities 10
Nonterminal <Q> is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
*** Search for ambiguities to depth 3...
*** Search for ambiguities to depth 4...
 Nonterminal <T> is not ambiguous
*** All ambiguities in grammar detected ***

该工具报告所有非终端都不明确。 所以,Grammar1并不含糊。

<强> Grammar2:

<first> = <first><first><second> ;
<second> = <third><second> ;
<third> = 'a' ;
<third> = 'b' ;
<third> = 'c' ;

DMS在Grammar2上运行:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar2.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
*** LR(0) State Machine construction complete ***
Determining if machine is SLR(0), SLR(1) or LALR(1)...
States: 9
Detecting possible cycles...

*** Circular definition:
Rule 1: <first> = <first> <first> <second> ;

*** Circular definition:
Rule 2: <second> = <third> <second> ;

What next? ambiguities 10
Nonterminal <first> is circularly defined
Nonterminal <second> is circularly defined
Nonterminal <third> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***

这个语法有一个问题,OP没有问过: 令牌<first><second>定义不明确 (&#34;循环定义&#34;根据此工具)。 应该清楚<first>扩大了起点 与<first>,但没有提供告诉 我们<first>可以扩展为具体的文字。 所以语法并不模糊......这完全是彻头彻尾的 碎。

<强> Grammar3

<A> = <B><A><A> ;
<B> = '1' ;
<B> = '2' ;
<B> = '3' ;
<B> = '4' ;

DMS在Grammar3上运行:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar3.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
LR(0) State Machine Generation Phase.
*** LR(0) State Machine construction complete ***

States: 8

Detecting possible cycles...

*** Circular definition:
Rule 1: <A> = <B> <A> <A> ;

What next? ambiguities 10
Nonterminal <A> is circularly defined
Nonterminal <B> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***

这种语法也以OP没有讨论的方式打破。 这里的问题是我们可以找到替代品 对于<A>,但它会导致无限扩展。 语法不明确,但接受 无限长的字符串,在实践中没用。

现在,从OP的意义上讲,这些语法都没有含糊不清 实际上想要。在这里,我展示了一个经典的模糊语法 基于if-then-else语句与悬挂的其他语句:

<强> Grammar4:

G = S ;
S = 'if' E 'then' S ;
S = 'if' E 'then' S 'else' S ;
S = V '=' E ;

DMS在Grammar4上运行:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive ..\Tests\ifthenelse_ambiguous.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening ..\Tests\ifthenelse_ambiguous.bnf
<<<Rule Collection Completed>>>
NTokens = 9 NRules = 4
*** LR(0) State Machine construction complete ***

What next? ambiguities 10

Nonterminal G is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...

Ambiguous Rules:
S = 'if' E 'then' S 'else' S ; SemanticCopy2
S = 'if' E 'then' S ; SemanticCopy2
Instance: < 'if' E 'then' 'if' E 'then' S 'else' S >
Derivation:
 1: < S 'else' S >
    < S >
 2: < 'if' E 'then' S 'else' S >
    < 'if' E 'then' S 'else' S >
*** All ambiguities in grammar detected ***

搜索为语句找到一个不明确的实例短语。如果你 看一下实例短语,你应该看到有一个else 子句......语法允许它将自己附加到if-then语句。

对于非常小的语法,你不需要像这样的工具;你可以通过查看规则并解决它来做到这一点。但是对于一个大的语法,这很难,而且这样的工具真的很有用。

考虑使用Java版本8语法运行,包含400多条规则:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive "C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf"
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf
<<<Rule Collection Completed>>>
NTokens = 243 NRules = 410
*** LR(0) State Machine construction complete ***
States: 774

What next? ambiguities 15

Nonterminal optional_CONTROL_Z is not ambiguous
Nonterminal package_name_declaration is not ambiguous
Nonterminal anonymous_class_creation is not ambiguous
Nonterminal annotation_type_declaration is not ambiguous
Nonterminal annotation_interface_header is not ambiguous
Nonterminal default_value is not ambiguous
Nonterminal field_declaration is not ambiguous
Nonterminal member_value is not ambiguous
Nonterminal marker_annotation is not ambiguous
Nonterminal single_member_annotation is not ambiguous
Nonterminal enum_body is not ambiguous
Nonterminal type_parameters is not ambiguous
Nonterminal modifier is not ambiguous
Nonterminal local_variable_declaration is not ambiguous
Nonterminal vararg_parameter is not ambiguous
Nonterminal variable_declarator_id is not ambiguous
Nonterminal variable_initializer is not ambiguous
Nonterminal primitive_type is not ambiguous
Nonterminal try_resource_list_opt is not ambiguous
Nonterminal catch_statements_opt is not ambiguous
Nonterminal finally_statement_opt is not ambiguous
Nonterminal finally_statement is not ambiguous
Nonterminal switch_group is not ambiguous
Nonterminal switch_label is not ambiguous
Nonterminal catch_statement is not ambiguous
Nonterminal catch_parameter is not ambiguous
Nonterminal literal is not ambiguous
Nonterminal array_dims is not ambiguous
Nonterminal array_creation_with_initialization is not ambiguous
Nonterminal dim_spec is not ambiguous
Nonterminal superpath is not ambiguous
Nonterminal thispath is not ambiguous
Nonterminal target is not ambiguous
Nonterminal unary_expression is not ambiguous
Nonterminal lambda_body is not ambiguous
Nonterminal right_angle is not ambiguous
*** Search for ambiguities to depth 1...
 Nonterminal type_declarations is not ambiguous
 Nonterminal annotations_opt is not ambiguous
 Nonterminal modifiers is not ambiguous
 Nonterminal brackets is not ambiguous
 Nonterminal switch_groups is not ambiguous
 Nonterminal bounds_list is not ambiguous
*** Search for ambiguities to depth 2...
 Nonterminal class_body is not ambiguous
 Nonterminal arguments is not ambiguous
 Nonterminal annotation_type_body is not ambiguous
 Nonterminal qualified_name is not ambiguous
 Nonterminal interface_body is not ambiguous
 Nonterminal enum_class_header is not ambiguous
 Nonterminal enum_class_body_opt is not ambiguous
 Nonterminal block is not ambiguous

Ambiguous Rules:
executable_statement = 'if' '(' expression ')' executable_statement 'else' executable_statement ; SemanticCopy2
executable_statement = 'if' '(' expression ')' executable_statement ; SemanticCopy2
Instance: < 'if' '(' expression ')' 'if' '(' expression ')' executable_statement 'else' executable_statement >
Derivation:
 1: < executable_statement 'else' executable_statement >
< executable_statement >
 2: < 'if' '(' expression ')' executable_statement 'else' executable_statement >
< 'if' '(' expression ')' executable_statement 'else' executable_statement >
 Nonterminal variable_declarator is not ambiguous
 Nonterminal try_resource_list is not ambiguous
*** Search for ambiguities to depth 3...
 Nonterminal type_arguments is not ambiguous
 Nonterminal member_value_pair is not ambiguous
*** Search for ambiguities to depth 4...
 Nonterminal array_creation_no_initialization is not ambiguous
 Nonterminal array_creation_with_initialization_header is not ambiguous
*** Search for ambiguities to depth 5...
 Nonterminal compilation_unit is not ambiguous
 Nonterminal nested_class_declaration is not ambiguous
 Nonterminal interface_header is not ambiguous
*** Search for ambiguities to depth 6...
 Nonterminal name is not ambiguous
 Nonterminal normal_annotation is not ambiguous
 Nonterminal type_parameter is not ambiguous
*** Search for ambiguities to depth 7...
 Nonterminal enum_constant is not ambiguous
 Nonterminal bound is not ambiguous
*** Search for ambiguities to depth 8...
 Nonterminal annotation is not ambiguous
 Nonterminal type is not ambiguous
 Nonterminal catch_statements is not ambiguous
 Nonterminal value_suffix is not ambiguous

Ambiguous Rules:
method_reference = type '::' type_arguments IDENTIFIER ; SemanticCopy2
method_reference = primary '::' type_arguments IDENTIFIER ; SemanticCopy2
Instance: < IDENTIFIER '::' type_arguments IDENTIFIER >
Derivation:
 1: < type '::' type_arguments IDENTIFIER >
< primary '::' type_arguments IDENTIFIER >
 2: < type >
< primary >
 3: < name brackets >
< primary >
 4: < annotations_opt IDENTIFIER type_arguments brackets >
< primary >
 5: < IDENTIFIER type_arguments brackets >
< primary >
 6: < IDENTIFIER type_arguments brackets >
< primary_not_new_array >
 7: < IDENTIFIER type_arguments brackets >
< IDENTIFIER >
 8: < type_arguments brackets >
< >
*** Search for ambiguities to depth 9...
 Nonterminal enum_constants is not ambiguous
 Nonterminal type_argument is not ambiguous
*** Search for ambiguities to depth 10...
 Nonterminal parameter is not ambiguous
*** Search for ambiguities to depth 11...
 Nonterminal class_header is not ambiguous
 Nonterminal nested_interface_declaration is not ambiguous
*** Search for ambiguities to depth 12...
 Nonterminal import_statement is not ambiguous
 Nonterminal type_declaration is not ambiguous
 Nonterminal name_list is not ambiguous
 Nonterminal variable_declarator_list is not ambiguous
 Nonterminal formal_name_list is not ambiguous
*** Search for ambiguities to depth 13...
*** Search for ambiguities to depth 14...
 Nonterminal method_declaration is not ambiguous

这需要大约5分钟才能运行,因为它正在计算一组指数增长的实例字符串。但我们学到了:

1)Java也有其他问题! (在我们处理的解析器中         这通过&#34;更喜欢转移&#39;其他&#39;&#34;规则,这个歧义检测器         我不知道。

2)method_reference的语法规则含糊不清。我认为     它也是实际Java标准中的这种方式。这实际上是     通过查看类型,在名称解析器中的解析器中处理     IDENTIFIER。

很容易谈论像这样的工具,但编写它并使它处理大型语法要困难得多。我通过我们的工具运行了3000规则COBOL语法,并检查了大约4800亿个不同的字符串扩展。还不知道整个语法是否含糊不清。 (它确实捕获了我们修复的愚蠢的东西。)