在flex中显示标记之前显示行号

时间:2016-01-17 19:41:47

标签: compiler-construction flex-lexer

我使用flex来读取cminus文件的内容,然后按以下格式显示内容: : 令牌 我可以显示令牌,但当我尝试显示行号时,我只能查看行号。 我的flex文件:

%option noyywrap
%option yylineno
%{
#include <stdio.h>
int lineNo = 1;
%}
line ^.*\n
letter [a-zA-Z]
digit  [0-9]


%x IN_COMMENT
%%

{line} {printf("%d:\n", lineNo++);} 
{digit}+    {
            printf("found NUM token\n");
            }
"while"    {
            printf("found WHILE token\n");
            }
"else"    {
            printf("found ELSE token\n");
            }
"if"    {
            printf("found IF token\n");
            }
"return"    {
            printf("found RETURN token\n");
            }
"void"    {
            printf("found VOID token\n");
            }
"int"    {
            printf("found INT token\n");
            }
"+"    {
            printf("found PLUS token\n");

            }
"-"    {
            printf("found MINUS token\n");

            }
"*"    {
            printf("found TIMES token\n");

            }
"/"    {
            printf("found OVER token\n");

            }
"<"    {
            printf("found LT token\n");

            }
"<="    {
            printf("found LTEQ token\n");
            }
">"    {
            printf("found GT token\n");
            }
">="    {
            printf("found GTEQ token\n");
            }
"=="    {
            printf("found EQ token\n");
            }
"!="    {
            printf("found NEQ token\n");
            }
"="    {
            printf("found ASSIGN token\n");
            }
";"    {
            printf("found SEMI token\n");

            }
","    {
            printf("found COMMA token\n");

            }
"("    {
            printf("found LPAREN token\n");

            }
")"    {
            printf("found RPAREN token\n");

            }
"["    {
            printf("found LBRACKET token\n");

            }
"]"    {
            printf("found RBRACKET token\n");

            }
"{"    {
            printf("found LBRACE token\n");

            }
"}"    {
            printf("found RBRACE token\n");

            }


[ \t]+
<INITIAL>{
"/*"              BEGIN(IN_COMMENT);
}
<IN_COMMENT>{
"*/"      BEGIN(INITIAL);
[^*\n]+   // eat comment in chunks
"*"       // eat the lone star
\n        yylineno++;
}

{letter}{letter}*  {
            printf("found ID token\n");
            }
. {printf("Unrecognized character");}
%%

int main( int argc, char **argv )
{
++argv, --argc;
if ( argc > 0 )
     yyin = fopen( argv[0], "r" );
else
     yyin = stdin;
yylex();
}

我的输入文件:

/* Sample program
  in CMinus language -
  computes factorial
*/
void main (void)
{
   int x;
   int whileimatit;

   /* read x; { input an integer } */
   x = input();

   /* if x > 0 then { don't compute if x <= 0 } */
   if ( x > 0 ) {
      /*     fact := 1; */
      whileimatit = 1;
      /*   repeat */
      while (x > 0)
      {
     /*     fact := fact * x; */
     whileimatit = whileimatit * x;
     /*     x := x - 1 */
     x = x - 1;
     /*   until x = 0; */
      }
      /* write fact  { output factorial of x } */
      output(whileimatit);

   /* end */
   }
}

我的输出:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:

期望的输出:

1:
2:
3:
4:
5:
found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token
6:
found LBRACE token
7:
found INT token
found ID token
found SEMI token
8:
found INT token
found ID token
found SEMI token
9:
10:
11:
found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token
12:
13:
14:
found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token
15:
16:
found ID token
found ASSIGN token
found NUM token
found SEMI token
17:
18:
found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
19:
found LBRACE token
20:
21:
found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token
22:
23:
found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token
24:
25:
found RBRACE token
26:
27:
found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token
28:
29:
30:
found RBRACE token
31:
found RBRACE token

如果我删除以下行:

{line} {printf("%d:\n", lineNo++);} 

我得到以下输出:

found VOID token
found ID token
found LPAREN token
found VOID token
found RPAREN token

found LBRACE token

found INT token
found ID token
found SEMI token

found INT token
found ID token
found SEMI token



found ID token
found ASSIGN token
found ID token
found LPAREN token
found RPAREN token
found SEMI token



found IF token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token
found LBRACE token


found ID token
found ASSIGN token
found NUM token
found SEMI token


found WHILE token
found LPAREN token
found ID token
found GT token
found NUM token
found RPAREN token

found LBRACE token


found ID token
found ASSIGN token
found ID token
found TIMES token
found ID token
found SEMI token


found ID token
found ASSIGN token
found ID token
found MINUS token
found NUM token
found SEMI token


found RBRACE token


found ID token
found LPAREN token
found ID token
found RPAREN token
found SEMI token



found RBRACE token

found RBRACE token

我无法将行号与输出一起打印。有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

您将line定义为

line ^.*\n

表示它匹配整行。这就是将要发生的事情。每一行都将作为line令牌进行匹配,并且不会使用任何其他规则。

您可以放弃line定义[注1],并使用模式/操作规则:

\n    {printf("%d:\n", lineNo++);} 

然而,这将在一行而不是一开始触发。此外,它不会在解析的最开始触发,它将在最后一行的末尾触发,这也是不可取的。

如果您只是尝试实现调试输出,我强烈建议您使用Flex的内置跟踪功能,在构建扫描仪时使用-d选项启用。您可能还想使用%option yylineno选项,它会告诉flex自动跟踪输入行号。 (灵活地做到这一点而不是自己动手做得更强大,显然工作量稍微减少。)

如果您确实要在每行的开头输出行号,可以使用与yyless()结合的开始条件重新扫描。这是一个最小的例子:

%option nodefault noyywrap noinput nounput
%option yylineno

%x BOL
%%
                BEGIN(BOL);                /* Note 2 */
<BOL>.|\n       { yyless(0);               /* Note 3 */
                  printf("Line %d:", yylineno);
                  BEGIN(INITIAL);
                }
\n              putchar('\n'); BEGIN(BOL); /* Note 4 */

  /* Rest of the rules go here. The following is minimal. */
[[:blank:]]+    ;
[^[:blank:]\n]+ printf(" word: '%s'", yytext);

注意:

  1. 事实上,如果不是所有的定义,你可能会弃绝。 [0-9]的可读性是否低于{digit}?我会说“不”,因为它有明确的含义,而digit可能被定义为任何东西。更清晰的是内置角色类[[:digit:]]

  2. (第6行)每次调用yylex时执行第一条规则之前的任何操作。在这种情况下,我们只调用yylex一次,以便我们可以侥幸逃脱;如果我们实际上是返回令牌,那么从驱动程序设置初始状态会更方便。或者只使用INITIAL作为行首状态,以及正常操作的其他一些启动条件。​​

  3. (第7-9行)当我们处于BOL状态时,我们会响应任何后续字符,包括换行符(表示空行)。如果我们在EOF,则不会执行此规则,因为在这种情况下没有后续字符。响应是删除我们刚刚从令牌中读取的字符(将令牌留空),然后打印指示我们所在行的消息。最后,我们更改为正常扫描状态,这将从行上的第一个字符开始(因为yyless)。

    尝试使用^锚点进行此操作很有诱惑力,但这样做无效。首先,flex不允许空模式,因此锚本身不是有效模式。仍然需要匹配以下字符。但是,如果不再次触发锚定规则,则无法重新扫描该字符,因为重新扫描时该字符仍将位于行的开头。因此使用了开始条件。

  4. (第11行)当我们点击换行符时,我们需要更改为BOL状态,以便下一个字符(如果有的话)将触发行号的输出。由于此示例在与行号相同的行上打印令牌,因此我们还需要将换行符发送到输出以终止当前行。