如何使用flex / bison生成代码并将其保存到文件中

时间:2018-09-06 14:24:35

标签: c bison flex-lexer

我正在为uni项目编写一个转换器,该转换器应该使用flex / bison将给定的Pascal代码转换为汇编代码。我已经编写了解析器和词法分析器,它们生成符号表(atm仅在没有过程和函数的情况下才能正常工作)。我的问题是,如何从中生成汇编代码并将其打印到文件中。

这是我的词法分析器:

%{
#include "parser.tab.h"
#include <string.h>
#define YY_FLEX_DEBUG 1
%}

letter      [a-zA-Z]
digit       [0-9]
ID          {letter}({letter}|{digit})*
delim       [ \t\n]
NUM         {digit}+(\.{digit}+)?(E[+\-]?(digit)+)?
ws          {delim}+

%%
{ws}        {                                           }
if          {return(IF);                                }
then        {return(THEN);                              }
else        {return(ELSE);                              }
{NUM}       {yylval.stringValue = strdup(yytext); return(NUM);          }
"<"         {yylval.stringValue = "<"; return(RELOP);   }
"<="        {yylval.stringValue = "<="; return(RELOP);  }
"="         {yylval.stringValue = "="; return(RELOP);   }
">"         {yylval.stringValue = ">"; return(RELOP);   }
">="        {yylval.stringValue = ">="; return(RELOP);  }
"<>"        {yylval.stringValue = "<>"; return(RELOP);  }
":="        {return(ASSIGNOP);                          }
do          {return(DO);                                }
program     {return(PROGRAM);                           }
var         {return(VAR);                               }
array       {return(ARRAY);                             }
of          {return(OF);                                }
integer     {return(INTEGER);                           }
real        {return(REAL);                              }
function    {return(FUNCTION);                          }
procedure   {return(PROCEDURE);                         }
begin       {return(START);                             }
end         {return(END);                               }
div         {yylval.stringValue = "div"; return(MULOP); }
mod         {yylval.stringValue = "mod"; return(MULOP); }
and         {yylval.stringValue = "and"; return(MULOP); }
"*"         {yylval.stringValue = "*"; return(MULOP);   }
"/"         {yylval.stringValue = "/"; return(MULOP);   }
while       {return(WHILE);                             }
or          {return(OR);                                }
"+"         {yylval.stringValue = "+"; return(SIGN);    }
"-"         {yylval.stringValue = "-"; return(SIGN);    }
".."        {return(DOUBLEDOT);                         }
","         {return *yytext;                            }
"("         {return *yytext;                            }
")"         {return *yytext;                            }
"["         {return *yytext;                    }
"]"         {return *yytext;                    }
";"         {return *yytext;                                }
":"         {return *yytext;                                }
"."         {return *yytext;                                }
not         {return(NOT);                               }
{ID}        {yylval.stringValue= strdup(yytext); return(ID);}
%%
int yywrap(void){}

这是我的解析器:

%{
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include "SymbolTable.h"
    int errors;
    int lable;
    #define YYDEBUG 1

    install (char *sym_name)
    {
        symrec *s;
        s = getsym(sym_name);
        if (s == 0)
            s = putsym(sym_name);
        else {
            errors++;
            printf("%s is defined\n", sym_name);
        }
    }

    install_num (char *sym_name)
    {
        symrec *s;
        s = getsym(sym_name);
        if (s == 0)
            s = putnum(sym_name);
    }

    context_check(char *sym_name)
    {
        if (getsym(sym_name) == 0)
            printf("%s is undeclared\n", sym_name);
    }
%}
%union
{
    int intValue;
    float floatValue;
    char *stringValue;
    int adress;
}
%start program
%token <stringValue> ID
%token <stringValue> NUM
%token IF THEN PROGRAM VAR ARRAY
%token OF INTEGER REAL
%token FUNCTION PROCEDURE
%token START END
%token ASSIGNOP RELOP MULOP
%token ELSE WHILE DO
%token SIGN OR
%token DOUBLEDOT
%token NOT
%left '-' '+'
%left '*' '/'
%%
program: PROGRAM ID '(' prog_list ')' ';' declarations subprogram_declarations compound_statement '.'
         ;
prog_list: ID
         | prog_list ',' ID
         ;
identifier_list: ID  {install($1);}
         | identifier_list ',' ID {install($3);} 
         ;
declarations: declarations VAR identifier_list ':' type ';'
         | /* empty */
         ;
type: standart_type
         | ARRAY '[' NUM DOUBLEDOT NUM ']' OF REAL {set_type("REALARR");}
         | ARRAY '[' NUM DOUBLEDOT NUM ']' OF INTEGER {set_type("INTARR");}
         ;
standart_type: INTEGER {set_type("INTEGER");}
         | REAL {set_type("REAL");}
         ;
subprogram_declarations: subprogram_declarations subprogram_declaration ';'
         | /* empty */
;
subprogram_declaration: subprogram_head declarations compound_statement;
subprogram_head: FUNCTION ID arguments ':' INTEGER ';' {install($2); set_type("INTEGER");}
         | FUNCTION ID arguments ':' REAL ';' {install($2); set_type("REAL");}
         | PROCEDURE ID arguments ';' {install($2); set_proc($2);}
         ;
arguments: '(' parameter_list ')'
         | /* empty */;
parameter_list: identifier_list ':' type
         | parameter_list ';' identifier_list ':' type
         ;
compound_statement: START
                    optional_statements END
         ;
optional_statements: statement_list
         | /* empty */
         ;
statement_list: statement
         | statement_list ';' statement
         ;
statement: variable ASSIGNOP expression
         | procedure_statement
         | compound_statement
         | IF expression THEN statement ELSE statement
         | WHILE expression DO statement
         ;
variable: ID {context_check($1);}
         | ID '[' expression ']' {context_check($1);}
         ;
procedure_statement: ID 
         | ID '(' expression_list ')'
         ;
expression_list: expression
         | expression_list ',' expression
         ;
expression: simple_expression
         | simple_expression RELOP simple_expression
         ;
simple_expression: term
         | SIGN term
         | simple_expression SIGN term
         | simple_expression OR term
         ;
term: factor
         | term MULOP factor
         ;
factor: variable
         | ID '(' expression_list ')' {context_check($1);}
         | NUM {install_num($1);}
         | '(' expression ')'
         | NOT factor
         ;
%%
main (int argc, char *argv[]) {
    FILE *output = fopen("output.asm", "w");
    fprintf(output, "\t  jump.i #lab0\n");
    extern FILE *yyin;
    ++argv; --argc;
    yyin = fopen(argv[0], "r");
    yydebug = 1;
    errors = 0;
    yyparse();
    print_sym_table();
    fprintf(output, "\t  exit");
    fclose(output);

}
yyerror (char *s) /* Called by yyparse on error */
{
    errors++;
    printf ("%s\n", s);
}

这是符号表:

struct symrec
{
    char *name;
    int addr;
    char *type;
    struct symrec *next; 
};
typedef struct symrec symrec;
symrec *sym_table = (symrec *)0;
symrec *putsym();
symrec *getsym();
symrec *putnum();
void set_type();
void set_proc();
void set_func();
void print_sym_table();

symrec *putsym(char *sym_name)
{
    symrec *ptr;
    ptr = (symrec *)malloc(sizeof(symrec));
    ptr->name = (char *)malloc(strlen(sym_name) + 1);
    ptr->type = NULL;
    strcpy(ptr->name,sym_name);
    ptr->next = (struct symrec *)sym_table;
    sym_table = ptr;
    return ptr;
}

symrec *putnum(char *sym_name)
{
    symrec *ptr;
    char *dPos = strchr(sym_name, '.');
    char *ePos = strchr(sym_name, 'e');
    ptr = (symrec *)malloc(sizeof(symrec));
    ptr->name = (char *)malloc(strlen(sym_name) + 1);
    if ((dPos == NULL) && (ePos == NULL)){
        ptr->type = (char *)malloc(strlen("INTEGER") + 1);
        strcpy(ptr->type, "INTEGER");
    }
    else if ((dPos != NULL) && (ePos == NULL)) {
        ptr->type = (char *)malloc(strlen("REAL") + 1);
        strcpy(ptr->type, "REAL");
    }
    else {
        ptr->type = (char *)malloc(strlen("FLOAT") + 1);
        strcpy(ptr->type, "FLOAT");
    }
    strcpy(ptr->name,sym_name);
    ptr->next = (struct symrec *)sym_table;
    sym_table = ptr;
    return ptr;
}

void set_type(char *type)
{
    symrec *ptr;
    for (ptr = sym_table; ptr != (symrec *)0; ptr = (symrec *)ptr->next) {
        if (ptr->type == NULL) {
            ptr->type = (char *)malloc(strlen(type) + 1);
            strcpy(ptr->type, type);
        }
    }
}

void set_proc(char *sym_name) {
    symrec *ptr;
    for (ptr = sym_table; ptr != (symrec *)0; ptr = (symrec *)ptr->next)
        if (strcmp (ptr->name, sym_name) == 0){
            ptr->type = (char *)malloc(strlen("PROC") + 1);
            strcpy(ptr->type, "PROC");
        }
}

symrec *getsym(char *sym_name)
{
    symrec *ptr;
    for (ptr = sym_table; ptr != (symrec *)0; ptr = (symrec *)ptr->next)
        if (strcmp (ptr->name, sym_name) == 0)
            return ptr;
    return 0;
}

void print_sym_table()
{
    symrec *ptr;
    for (ptr = sym_table; ptr != (symrec *)0; ptr = (symrec *)ptr->next)
        printf("\n%s    %s\n", ptr->name, ptr->type);
}

简单的测试文件

program example(input, output);
var x, y: integer;
var g,h:real;

begin
  g:=x+y;
  write(g)
end.

它应该打印到输出文件中的内容:

     jump.i  #lab0                   ;jump.i  lab0
lab0:
        add.i   0,4,24                  ;add.i   x,y,$t0
        inttoreal.i 24,28               ;inttoreal.i $t0,$t1
        mov.r   28,8                    ;mov.r   $t1,g
        write.r 8                       ;write.r g
        exit                            ;exit    
不需要

注释(; jump.i lab0)。

我知道应该如何计算变量的地址,并且可以将Pascal代码翻译到纸上的该汇编器上,但是我真的不明白我应该在bison或flex文件中的什么位置放置什么,以便将汇编器代码生成到输出中文件。我试图为rule中的begin语句生成标签:

compound_statement: START {fprintf(output, "lab0\n");}
                    optional_statements END

但是出现了分割错误。很明显如何生成标签,但是我应该如何生成

add.i 0, 4, 24

用该符号表建立符号表后,是否应该创建另一个解析器?还是无需其他解析器就可以做到。需要一些提示下一步该怎么做。

2 个答案:

答案 0 :(得分:1)

因此,您将获得以下代码:

compound_statement: START {fprintf(output, "lab0\n");}
                    optional_statements END

您这样做的方向正确,但是在添加细分时会遇到细分错误,这是因为output尚未初始化。

我看不到您在其中声明了被引用的output的地方,但是与在您打开文件进行输出的main中声明的地方不同。

main (int argc, char *argv[]) {
    FILE *output = fopen("output.asm", "w");

该版本outputmain的本地版本,仅在该函数内部可见。如果您从output中删除main的声明,而只保留赋值,则将fopen的结果赋给野牛的全局声明版本output代码正在使用。

main (int argc, char *argv[]) {
    output = fopen("output.asm", "w");

不确定您为什么对问题的另一部分感到困惑,因为您已经在解析器中演示了如何做到这一点。看一下解析器的这一点:

variable: ID {context_check($1);}

它将采用“ ID”的值-$1-并将其传递给该函数。如果您希望“变量”包含一个值,则可以将其存储在$$中。然后,当您在此处使用较高的“变量”时:

statement: variable ASSIGNOP expression

$1将包含您在$$中为“变量”输入的任何值。 $2将是从“ ASSIGNOP”令牌获得的值,而$3将具有来自“表达式”的结果。再一次,如果您将值存储在$$中,则可以在需要“声明”的任何内容中使用它。

$$$1等...都是使用%union创建的类型,因此您也可以执行$$.intValue或{{1} },如果您需要具体说明要设置的值。

答案 1 :(得分:1)

例如,在解析器中,您有一个模式:

| term MULOP factor

您想对这种模式执行以下操作:

{ fprintf(output, "mul term, factor, result\n"); }

但是它开始变得非常粘滞:术语,因子在哪里,结果应该放在哪里? 最简单的答案是堆栈:只要引用了变量,就将其值压入堆栈。只要匹配一个操作,就将操作数弹出到寄存器中,执行该操作,并将结果压入,这样上面的内容就会变成:

    {
   fprintf(output, "pop r0; pop r1; mul r1, r0, r0;");
   fprintf(output, "push r0\n");
}

和分配只是将堆栈弹出到一个变量中。