使用flex和bison工具实现«syntax»flex

时间:2014-10-13 19:58:11

标签: bison yacc flex-lexer lex

我遇到了这个问题的问题。我需要实现flex语法工具。为此我写了无上下文语法。以下是我应该能够处理的一些示例示例:

№1:

Rname 0|1
String1 {Rname}{Rname}*(111)
String2 {Rname}{Rname}*(000)
%%

№2:

%%
Rname   0|1
String1 {Rname}{Rname}*(111)
String2 {Rname}{Rname}*(000)
%%
{String1}   {return 1;}
{String2}   {return 1;}
%%

№3:

Name        [a-zA-Z][a-zA-Z0-9]*
Words       [a-zA-Z0-9]*
%%
{Identifier} {return AP_Name;}
{Words} {return AP_Words;}
%%

最复杂的例子,它考虑了所有选项:

A "abc" | "cba"
B "qwe"
C 111
D rty* | rty+
E ({C | D})*
%%
{String1} {return 1;}
{String2} {return 1;}

{A} {return AP_A;}
{B} {return AP_B;}
{C} {return AP_C;}
{D} {return AP_D;}
{E} {return AP_E;}
%%

我有一个flex文件,如下所示:

%option noyywrap
%option yylineno
%option never-interactive
%{
#include <stdio.h>
#include "bison.tab.h"
%}

%%
[a-zA-Z][a-zA-Z0-9]*     {return AP_Name;}
[a-zA-Z0-9]*         {return AP_Words;}

\(      {return AP_Bracket_open1;}
\)      {return AP_Bracket_close1;}

\[      {return AP_Bracket_open2;}
\]      {return AP_Bracket_close2;}

\{      {return AP_Bracket_open3;}
\}      {return AP_Bracket_close3;}

\+      {return AP_Plus;}
\*      {return AP_Multiply;}
\ '|'       {return AP_Or;}
'%%'        {return AP_Percentage;}
\;      {return AP_Semicolon;}
\-      {return AP_Dash;}
\"      {return AP_Quote;}
%%

还有一个野牛文件如下:

%{
#include <stdio.h>
extern int yylineno;
void yyerror(char const *msg)
{
    fprintf(stderr, "%d: %s\n", yylineno, msg);
}
int yyparse();
#define YYPRINT(file, type, value) fprintf(file, "%d", value);
%}
%token AP_Name
%token AP_Words
%token AP_Bracket_open1
%token AP_Bracket_close1
%token AP_Bracket_open2
%token AP_Bracket_close2
%token AP_Bracket_open3
%token AP_Bracket_close3
%token AP_Plus
%token AP_Multiply
%token AP_Or
%token AP_Percentage
%token AP_Semicolon
%token AP_Dash
%token AP_Quote
%%
S : Block1 AP_Percentage Block2 AP_Percentage;

Identifier : AP_Name | AP_Words;

Block1 : AP_Name Patern Block1 | ;
Patern : Regex | Regex Patern;

Regex  : Value Plurality /* abc* */ /* abc+ */
       | AP_Bracket_open1 Value AP_Bracket_close1 Plurality  /* [abc] */ /* [abc]* */ /* [abc]+ */
       | AP_Bracket_open2 Value AP_Bracket_close2 Plurality  /* (abc) */ /* (abc)* */ /* (abc)+ */
       | AP_Bracket_open3 Value AP_Bracket_close3 Plurality; /* {abc} */ /* {abc}* */ /* {abc}+ */

Plurality : AP_Plus | AP_Multiply |; /* + * */

Value : Identifier /* abc */
      | AP_Quote Identifier AP_Quote /* "abc" */
      | Identifier AP_Or Identifier /* abc | cba*/
      | Identifier AP_Dash Identifier; /*abc - cba*/

Block2 : AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} */
       | AP_Bracket_open3 Identifier AP_Bracket_close3 AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc}{abc} */
       |
       ;
%%
extern FILE *yyin;
int main()
{
    yydebug=1;
    yyin = fopen("test.txt","r");
    if (yyparse() != 0)
        return 0;
    else
    {
        printf("Success\n");
        return 0;
    }
}

在工作中我使用以下参数集:

flex lex.l
bison bison.y -d -t
cc lex.yy.c bison.tab.c bison.tab.h

现在我正在尝试处理我的第一个示例,最后我遇到以下错误:

Reducing stack by rule 4 (line 31):
   $1 = token AP_Name (0)
   $2 = nterm Patern ()
   $3 = nterm Block1 ()
-> $$ = nterm Block1 ()
Stack now 0
Entering state 3
Now at end of input.
1: syntax error
Error: popping nterm Block1 ()
Stack now 0
Cleanup: discarding lookahead token $end (0)
Stack now 0

我是否正确理解野牛想要读取文件的末尾,它看到文件的结尾,同时关闭文件读取结束的错误? 语法本身尚未完成(不确定我知道如何完成它),但我希望它至少在这个阶段起作用。
对不起我的英语。

UPD:

我发现了导致错误的规则 - 它Block1:AP_Name Patern Block1 |; 当我尝试重写语法时,我得到了这个:

%token AP_Name
%token AP_Word
%token AP_Percentage
%token AP_Plus
%token AP_Or
%token AP_Multiply
%token AP_Bracket_open1
%token AP_Bracket_close1
%token AP_Bracket_open2
%token AP_Bracket_close2
%token AP_Bracket_open3
%token AP_Bracket_close3
%token AP_Dash
%token AP_Quote
%%
S : Block1 AP_Percentage Block2 AP_Percentage;

Identifier : AP_Word | AP_Name;

Block1 : AP_Name Regex Block1 | ;

Regex : BracketO Regex
      | Identifier AP_Or Regex /* a|b */
      | Identifier Regex
      | Identifier Plurality Regex /* abc+ or abc* */
      | AP_Quote Identifier AP_Quote /* "abc" */
      |
      ;

BracketO : AP_Bracket_open1 Class BracketC Plurality BracketO /* (abc) */
         | AP_Bracket_open2 Class BracketC Plurality BracketO /* [abc] */
         | AP_Bracket_open3 AP_Name AP_Bracket_close3 Plurality BracketO /* {abc} */
         |
         ;

BracketC : AP_Dash Class BracketC /* [a-b] */
         | AP_Or Class BracketC /* [a|b] or (a|b) */
         | AP_Bracket_close1 /* ((abc)) */
         | AP_Bracket_close2; /* [[abc]] */

Plurality : AP_Multiply | AP_Plus | ; /* * + */

Class : Identifier | BracketO; /* ({Rname}) */

Block2 : AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} */
       | AP_Bracket_open3 Identifier AP_Bracket_close3 AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} {cba} */
       |
       ;
%%

我不喜欢它,但它有效36冲突转移/减少。 (这是我在评论中写的)

1 个答案:

答案 0 :(得分:1)

不,EOF读取不会失败。问题是你试图一次解决太多规则。这是一个可以帮助你入门的建议:

首先,在bison.y中,注释掉或删除所有规则并将其替换为

S:      AP_Bracket_open1 Identifier AP_Bracket_close1;

Identifier : AP_Name | AP_Words;

重新编译,并一次测试以下输入:

( c )

( 5 )

两个应该工作!解析器的最后一行应该说成功

接下来,您应验证您的value符号是否有效,因为其规则仅包含令牌和已验证的符号(仅在此情况下为Identifier)。删除或注释掉所有规则并将其替换为:

S:      Value;

Identifier : AP_Name | AP_Words;

Value : Identifier
      | AP_Quote Identifier AP_Quote
      | Identifier AP_Or Identifier
      | Identifier AP_Dash Identifier
      ;

重新编译,并至少测试以下输入组合:

c
5
"c"
"5"
c | 5
c - 5
5 | c
5 - c

是的,测试所有可能的组合,甚至更多(尝试插入空格和换行符)。

现在按照此顺序继续PluralityRegexPaternBlock1Block2。迟早你也会发现你的转变/减少冲突。

祝你好运!