flex& bison转移/减少冲突

时间:2017-03-11 09:33:30

标签: grammar bison

以下是我的语法的一部分:

if ((DateTime.Now - THIS_IS_SAVED_DATE_5_DAYS_AGO).TotalDays < 5) {
    //Executed when not older than two days
}

我希望我的语法支持如下:

expr_address
: expr_address_category expr_opt    { $$ = new ExprAddress($1,*$2);}
| axis axis_data                    { $$ = new ExprAddress($1,*$2);}
;
axis_data
: expr_opt          { $$ = $1;}
| sign              { if($1 == MINUS)
                        $$ = new IntergerExpr(-1000000000);
                      else if($1 == PLUS)
                        $$ = new IntergerExpr(+1000000000);}
;
expr_opt
:               { $$ = new IntergerExpr(0);}
| expr          { $$ = $1;}
;
expr_address_category
: I                 { $$ = NCAddress_I;}
| J                 { $$ = NCAddress_J;}
| K                 { $$ = NCAddress_K;}
;
axis
: X                 { $$ = NCAddress_X;}
| Y                 { $$ = NCAddress_Y;}
| Z                 { $$ = NCAddress_Z;}
| U                 { $$ = NCAddress_U;}
| V                 { $$ = NCAddress_V;}
| W                 { $$ = NCAddress_W;}
;
expr
: '[' expr ']'              {$$ = $2;}
| COS parenthesized_expr    {$$ = new BuiltinMethodCallExpr(COS,*$2);}
| SIN parenthesized_expr    {$$ = new BuiltinMethodCallExpr(SIN,*$2);}
| ATAN parenthesized_expr   {$$ = new BuiltinMethodCallExpr(ATAN,*$2);}
| SQRT parenthesized_expr   {$$ = new BuiltinMethodCallExpr(SQRT,*$2);}
| ROUND parenthesized_expr  {$$ = new BuiltinMethodCallExpr(ROUND,*$2);}
| variable                  {$$ = $1;}
| literal                           
| expr '+' expr                 {$$ = new BinaryOperatorExpr(*$1,PLUS,*$3);}
| expr '-' expr                 {$$ = new BinaryOperatorExpr(*$1,MINUS,*$3);}
| expr '*' expr                 {$$ = new BinaryOperatorExpr(*$1,MUL,*$3);}
| expr '/' expr                 {$$ = new BinaryOperatorExpr(*$1,DIV,*$3);}
| sign expr %prec UMINUS        {$$ = new UnaryOperatorExpr($1,*$2);}
| expr EQ expr                  {$$ = new BinaryOperatorExpr(*$1,EQ,*$3);}
| expr NE expr                  {$$ = new BinaryOperatorExpr(*$1,NE,*$3);}
| expr GT expr                  {$$ = new BinaryOperatorExpr(*$1,GT,*$3);}
| expr GE expr                  {$$ = new BinaryOperatorExpr(*$1,GE,*$3);}
| expr LT expr                  {$$ = new BinaryOperatorExpr(*$1,LT,*$3);}
| expr LE expr                  {$$ = new BinaryOperatorExpr(*$1,LE,*$3);}
;
variable 
: d_h_address               {$$ = new AddressExpr(*$1);}
;
d_h_address
: D INTEGER_LITERAL     { $$ = new IntAddress(NCAddress_D,$2);}
| H INTEGER_LITERAL     { $$ = new IntAddress(NCAddress_H,$2);}
;

前两位与X0相同;顺便说一下,签名 - &gt; +/-;

但野牛报告冲突,是bison.output的关键部分:

H100=20;
X;
X+0;
X+;
X+H100;   //means H100 variable ref

我不知道该怎么处理,谢谢提前。

修改 我做了一个最小的语法:

State 108

11 expr: sign . expr
64 axis_data: sign .

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 64 (axis_data)]
H         [reduce using rule 64 (axis_data)]
$default  reduce using rule 64 (axis_data)

State 69

62 expr_address: axis . axis_data

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 65 (expr_opt)]
H         [reduce using rule 65 (expr_opt)]
$default  reduce using rule 65 (expr_opt)

State 68

61 expr_address: expr_address_category . expr_opt

INTEGER_LITERAL  shift, and go to state 93
REAL_LITERAL     shift, and go to state 94
'+'              shift, and go to state 74
'-'              shift, and go to state 75
COS              shift, and go to state 95
SIN              shift, and go to state 96
ATAN             shift, and go to state 97
SQRT             shift, and go to state 98
ROUND            shift, and go to state 99
D                shift, and go to state 35
H                shift, and go to state 36
'['              shift, and go to state 100

D         [reduce using rule 65 (expr_opt)]
H         [reduce using rule 65 (expr_opt)]
$default  reduce using rule 65 (expr_opt)

我希望它可以支持:

    %{
    #include <stdio.h>
    extern "C" int yylex();
    void yyerror(const char *s) { printf("ERROR: %s/n", s); }
%}

%token PLUS '+'  MINUS '-' 

%token D H I J K X Y Z INT

/*%type sign expr var expr_address_category expr_opt
%type axis */

%start word_list

%%
/*Above grammar lost this rule,it makes ambiguous*/
word_list
    : word
    | word_list word
    ;
sign
    : PLUS
    | MINUS
    ;
expr
    : var
    | sign expr
    | '[' expr ']'
    ;
var 
    : D INT
    | H INT
    ;
word
    : expr_address
    | var '=' expr
    ;
expr_address
    : expr_address_category expr_opt
    /*| '(' axis sign ')'*/
    | axis sign
    ;
expr_opt
    : /* empty */
    | expr
    ;
expr_address_category
    : I 
    | J
    | K
    | axis
    ;
axis
    : X
    | Y
    | Z
    ;
%%

EDIT2:

上述编辑失去了这条规则。

X;
X0;
X+0;  //the top three are same with X0
X+;
X+H100;  //this means X's data is ref +H100;
X+H100=10; //two word on a block,X+ and H100=10;
XH100=10;  //two word on a block,X and H100=10;

因为我必须允许这样的语法:

block
    : word_list ';' 
    | ';'
    ;

2 个答案:

答案 0 :(得分:0)

主要问题是它无法确定wordword_list的哪一个结束而下一个;开始,因为单词之间没有分隔符号。这与您的示例形成对比,后者都有;个终结符。因此,这表明有一个明显的解决方法 - 放入word: expr_address ';' | var '=' expr ';' 分隔符:

axis

这解决了大多数问题,但是当前瞻是expr_address_category时,它无法确定signexpr是否为先行冲突,因为这取决于签名后是否有expr_address : expr_address_category expr_opt | axis expr_opt | axis sign 。您可以通过重构来推迟决定:

axis

..并从expr_address_category

中移除{{1}}

答案 1 :(得分:0)

这基本上是经典的LR(2)语法,除了在你的情况下它是LR(3),因为你的变量由两个标记组成[注1]:

var : D INT | H INT

基本问题是没有分隔符的单词串联:

word_list : word | word_list word

结合word的其中一个选项以可选var结尾的事实:

word: expr_address
expr_address: expr_address_category expr_opt

而另一个以var

开头
word: var '=' expr

= 使这一点明确无误,因为expr中的任何内容都不能包含该符号。但是在需要做出决定的时候, = 是不可见的,因为前瞻是var的第一个标记 - 要么是H,要么是D production: SYMBOL ':' | production SYMBOL /* Lots of detail omitted */ - 等号仍然是两个令牌。

这个LR(2)语法与yacc / bison本身使用的语法非常相似,这个事实我总觉得具有讽刺意味,因为yacc的语法在制作之间不需要;

SYMBOL

与语法一样,这使得无法知道%glr-parser 是否应该移位或触发减少因为消除歧义仍然不可见。

由于语法(我假设)是明确的,而野牛现在可以生成GLR解析器,这将是最简单的解决方案:只需添加

expr_address

到您的野牛序幕(但请阅读GLR解析器的野牛手册部分以了解权衡)。

请注意,shift-reduce冲突仍会报告为警告;由于无法可靠地判断语法是否含糊不清,因此野牛不会尝试这样做,如果存在歧义,则会在运行时报告歧义。

您还应该修复@ChrisDodd's answer中提到的有关production: SYMBOL_COLON | production SYMBOL 重构的问题(尽管使用GLR解析器并不是绝对必要的。)

如果由于某种原因,您认为GLR解析器无法满足您的需求,您可以在yacc(包括bison)的大多数实现中使用该解决方案,这是词法扫描程序中的黑客攻击。基本思路是在词法分析器中标记符号后面是否跟随冒号,以便上述作品可以重写为:

word: expr_address expr_opt
    | VARIABLE_EQUALS expr
// ...
expr: VARIABLE

如果您愿意将字母和数字合并为一个令牌,此解决方案将对您有用:

/* The use of static variables makes this yylex wrapper unreliable
 * if it is reused after a syntax error.
 */
int yylex_wrapper() {
  static int saved_token = -1;
  static YYSTYPE saved_yylval = {0};

  int token = saved_token;
  saved_token = -1;
  yylval = saved_yylval;
  // Read a new token only if we don't have one in the queue.
  if (token < 0) token = yylex();
  // If the current token is IDENTIFIER, check the next token
  if (token == IDENTIFIER) {
    // Read the next token into the queue (saved_token / saved_yylval)
    YYSTYPE temp_val = yylval;
    saved_token = yylex();
    saved_yylval = yylval;
    yylval = temp_val;
    // If the second token is '=', then modify the current token
    // and delete the '=' from the queue
    if (saved_token == '=') {
        saved_token = -1;
        token = IDENTIFIER_EQUALS;
    }
  }
  return token;
}

我的偏好是在词法分析器的包装器中进行这种转换,它保留挂起令牌的(单元素)队列:

var

注释

  1. 就个人而言,我首先要制作一个H /* Some comment in the middle of the variable name */ 100 一个令牌(你真的想让人们写一下:

    uLOG

    但这不会解决任何问题;它只是将语法的前瞻要求从LR(3)降低到LR(2)。