带有模糊性的Context Free Grammar - 使用Ply lex / yacc生成器的小编译器

时间:2018-04-29 06:37:20

标签: python parsing compiler-construction ply

我正在使用Ply lex / yaac生成器为Python编写一个小编译器,用于编译器类。现在我们正在进入语义分析阶段,很快就会进行代码生成。但是我的语法还没有完成,因为我无法为函数调用编写规则,因为我到目前为止尝试的所有内容都以转换/减少冲突结束,这使规则被忽略。

有以下规则:function_call : ID function_call_args

   function_call_args : expression
                      | function_call

program : function_definition program
              | class_definition program
              | function_call
              | empty

当我添加规则expression : function_call时,结果是模棱两可的并且给出了转移/减少冲突。我希望我的语法中的function_call可以是一个表达式,然后一个人就可以编写一些代码,比如a + b + f(2)* g(a / x) - h()。

我是否添加空作为function_call_args或logical_expression的替代,它也变得模棱两可。或者我是否只是告诉我返回一个function_call作为替代方法的return语句也可以减少冲突。后者并不重要,如果function_call可以成为表达式的替代,那么能够在函数的任何位置调用函数并且即使返回函数调用也足够了。

Bellow遵循词法分析器,解析器和完整语法的完整代码:

全文法

Rule 0     S' -> program
Rule 1     program -> function_definition program
Rule 2     program -> class_definition program
Rule 3     program -> function_call
Rule 4     program -> empty
Rule 5     function_call -> ID function_call_args
Rule 6     function_call_args -> expression
Rule 7     function_call_args -> function_call
Rule 8     class_definition -> DEFINE CLASS ID LPAREN RPAREN LBRACE program RBRACE
Rule 9     function_definition -> DEFINE data_type ID LPAREN function_definition_args RPAREN block
Rule 10    function_definition_args -> var_list
Rule 11    function_definition_args -> empty
Rule 12    var_list -> var_declaration COMMA var_list
Rule 13    var_list -> var_declaration
Rule 14    var_declaration -> data_type ID
Rule 15    data_type -> VOID
Rule 16    data_type -> INT
Rule 17    data_type -> FLOAT
Rule 18    data_type -> STRING
Rule 19    data_type -> CHAR
Rule 20    block -> LBRACE statements RBRACE
Rule 21    statements -> statement statements
Rule 22    statements -> empty
Rule 23    statement -> matched_statement
Rule 24    statement -> unmatched_statement
Rule 25    matched_statement -> IF LPAREN logical_expression RPAREN matched_statement ELSE matched_statement
Rule 26    matched_statement -> WHILE LPAREN logical_expression RPAREN matched_statement
Rule 27    matched_statement -> FOR LPAREN var_declaration SEMI_COLON logical_expression SEMI_COLON attribution RPAREN matched_statement
Rule 28    matched_statement -> return_statement
Rule 29    matched_statement -> compound_statement
Rule 30    matched_statement -> simple_statement
Rule 31    matched_statement -> block
Rule 32    unmatched_statement -> IF LPAREN logical_expression RPAREN matched_statement
Rule 33    unmatched_statement -> IF LPAREN logical_expression RPAREN matched_statement ELSE unmatched_statement
Rule 34    logical_expression -> logical_term OR logical_term
Rule 35    logical_expression -> logical_term
Rule 36    logical_term -> logical_factor AND logical_factor
Rule 37    logical_term -> logical_factor
Rule 38    logical_factor -> boolean_statement
Rule 39    logical_factor -> NOT logical_factor
Rule 40    logical_factor -> LPAREN logical_expression RPAREN
Rule 41    boolean_statement -> value comparison_op value
Rule 42    boolean_statement -> value
Rule 43    value -> ID
Rule 44    value -> number
Rule 45    number -> int
Rule 46    number -> float
Rule 47    int -> INTCONST
Rule 48    float -> FLOATCONST
Rule 49    comparison_op -> LESS_THAN
Rule 50    comparison_op -> LESS_EQUAL
Rule 51    comparison_op -> GREATER_THAN
Rule 52    comparison_op -> GREATER_EQUAL
Rule 53    comparison_op -> EQUAL
Rule 54    comparison_op -> NOT_EQUAL
Rule 55    compound_statement -> var_list
Rule 56    compound_statement -> attribution
Rule 57    simple_statement -> expression
Rule 58    attribution -> var_declaration EQUALS expression
Rule 59    attribution -> var_declaration EQUALS string
Rule 60    attribution -> var_declaration EQUALS char
Rule 61    attribution -> ID EQUALS expression
Rule 62    attribution -> ID EQUALS STRINGCONST
Rule 63    attribution -> ID EQUALS CHARCONST
Rule 64    string -> STRINGCONST
Rule 65    char -> CHARCONST
Rule 66    expression -> expression PLUS term
Rule 67    expression -> expression MINUS term
Rule 68    expression -> term
Rule 69    term -> term TIMES factor
Rule 70    term -> term DIVIDE factor
Rule 71    term -> term MOD factor
Rule 72    term -> factor
Rule 73    factor -> number
Rule 74    factor -> ID
Rule 75    factor -> factor_expr
Rule 76    factor_expr -> LPAREN expression RPAREN
Rule 77    uminus_expression -> MINUS LPAREN expression RPAREN
Rule 78    empty -> <empty>
Rule 79    return_statement -> RETURN expression
Rule 80    function_definition -> DEFINE data_type ID LPAREN function_definition_args RPAREN error

词法:

import ply.lex as lex
import ply.yacc as yacc


# Reserved words
reserved = {
    'if' : 'IF',
    'else' : 'ELSE',
    'while' : 'WHILE',
    'for' : 'FOR',
    'switch' : 'SWITCH',
    'case' : 'CASE',
    'class' : 'CLASS',
    'define' : 'DEFINE',
    'int' : 'INT',
    'float' : 'FLOAT',
    'char' : 'CHAR',
    'string' : 'STRING',
    'void' : 'VOID',
    'equal' : 'EQUAL',
    'and' : 'AND',
    'or' : 'OR',
    'not' : 'NOT',
    'do' : 'DO',
    'return' : 'RETURN',
}

# The tokens declaration is made here.
tokens = [

    # Literals (identifier, integer constant, float constant, string constant,
    # char const)
    # TODO add constants' Types
    'ID',
    #'NUMBER',

    # Operators +,-,*,/,%
    'PLUS',
    'MINUS',
    'TIMES',
    'DIVIDE',
    'MOD',

    # Logical Operators
    'LESS_THAN',
    'LESS_EQUAL',
    'GREATER_THAN',
    'GREATER_EQUAL',
    'NOT_EQUAL',

    # Delimeters such as (),{},[],:
    'LPAREN',
    'RPAREN',
    'LBRACE',
    'RBRACE',
    'COLON',
    'COMMA',
    'SEMI_COLON',

    #Assignment Operators
    'EQUALS',

    # Integer literal
    'INTCONST',
    # Floating literal
    'FLOATCONST',
    # String literal
    'STRINGCONST',
    # Character constant 'c' or L'c'
    'CHARCONST'
] 

tokens +=  list(reserved.values())

# Regular expression rules for simple tokens
#t_NUMBER  = r'\d+'

# Operators
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_MOD     = r'\%'

# Logical Operators
t_LESS_THAN     = r'<'
t_LESS_EQUAL    = r'<=' 
t_GREATER_THAN  = r'>'
t_GREATER_EQUAL = r'>='
t_NOT_EQUAL     = r'!='

# Delimiters
t_LPAREN     = r'\('
t_RPAREN     = r'\)'
t_COLON      = r'\:'
t_LBRACE     = r'\{'
t_RBRACE     = r'\}'
t_COMMA      = r','
t_SEMI_COLON = R';'

# Assignment
t_EQUALS  = r'='


# Integer literal
t_INTCONST = r'\d+'

# Floating literal
t_FLOATCONST = r'\d+[.]\d+'

# String literal
t_STRINGCONST = r'\"([^\\\n]|(\\.))*?\"'

# Character constant 'c' or L'c'
t_CHARCONST = r'\'([^\\\n]|(\\.))*?\''


# Rule for identificator
def t_ID(t):
    r'[a-zA-Z_][a-zA-Z0-9_]*'
    t.type = reserved.get(t.value,'ID')    # Check for reserved words
    return t

# Define a rule so we can track line numbers
def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

# Compute column.
#     input is the input text string
#     token is a token instance
def find_column(input, token):
    line_start = input.rfind('\n', 0, token.lexpos) + 1
    return (token.lexpos - line_start) + 1

# A string containing ignored characters (spaces and tabs)
t_ignore  = ' \t'

# Error handling rule
def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# One line comments Python alike
def t_comment(t):
    r"[ ]*\043[^\n]*"  # \043 is '#'
    pass

# Build the lexer
#lexer = lex.lex()

lex.lex()

if __name__ == '__main__':
    lex.runmain()

分析器:

import sys
sys.path.append("..")
import ply.yacc as yacc
from lexical_analyzer import tokens
from ast import *
from semantical_analyser.semantical_analyzer import SymbolTable, Var, Scope
from semantical_analyser.stack import Stack

symbolTable = SymbolTable()
stack = Stack()
stack.push(Scope())


precedence = (
   ('nonassoc', 'LESS_THAN', 'GREATER_THAN'),  # Nonassociative operators
   ('nonassoc', 'LESS_EQUAL', 'GREATER_EQUAL'),  # Nonassociative operators
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE', 'MOD'),
    ('right', 'UMINUS'),            # Unary minus operator
)

# Initial Rule
def p_program(p):
   '''program : function_definition program
              | class_definition program
              | function_call
              | empty'''
   if len(p) == 3:
      p[0] = Program(p[1], p[2])
   else:
      p[0] = p[1]
      stack.push(Scope())
   pass


#def p_function_call_expr(p):
#   '''expression : function_call'''

def p_function_call(p):
   '''function_call : ID LPAREN function_call_args RPAREN'''
   #p[0] = Node("FUNCTION_CALL", [p[1], p[2], p[3]])
   p[0] = FunctionCall(p[1], p[2])
   pass

def p_function_call_args(p):
    '''function_call_args : expression
                          | function_call'''
    #p[0] = Node("FUNCTION_CALL_ARGS", [p[1]])
    p[0] = p[1]
    pass

def p_class_definition(p):
   ''' class_definition : DEFINE CLASS ID LPAREN RPAREN LBRACE program RBRACE'''
   #p[0] = ClassDefinition("CLASS_DEFINITION", [p[1], p[2], p[3], p[4], p[5], p[6]], p[7], p[8]])
   p[0] = ClassDefinition(p[3], p[7])
   pass

# Function definition
def p_function_definition(p):
    '''function_definition : DEFINE data_type ID LPAREN function_definition_args RPAREN block'''
    #p[0] = Node("FUNCTION_DEFINITION", [p[1], p[2], p[3], p[4], p[5], p[6], p[7]])
    p[0] = FunctionDefinition(p[2], p[3], p[5], p[7])
    pass

# Arguments that goes inside a function definition
def p_function_definition_args(p):
    '''function_definition_args : var_list
                                | empty'''
    #p[0] = Node("FUNCTION_DEFINITION_ARGS", [p[1]])
    p[0] = p[1]
    pass

# list of declaration of variables or a single variable
def p_var_list(p):
    '''var_list : var_declaration COMMA var_list
                | var_declaration'''
    if len(p) >= 3:
        p[0] = VarList(p[1], p[3])
    else:
        p[0] = p[1]
    pass

# declaration of var
def p_var_declaration(p):
   'var_declaration : data_type ID'
   p[0] = VarDeclaration(p[1], p[2])
   var = Var("ID", p[2], p[1])
   symbolTable.addVar(var)
   topOfStack = stack.peek()
   topOfStack.addObj(var)
   pass

# data types for variables 
def p_data_type(p):
    '''data_type : VOID 
                 | INT 
                 | FLOAT 
                 | STRING 
                 | CHAR'''
    p[0] = DataType(p[1])
    pass

# block - a block is enclosed by braces {stmt}
def p_block(p):
   'block : LBRACE statements RBRACE'
   #p[0] = Node("BLOCK", [p[1], p[2], p[3]])
   p[0] = Block(p[2])
   #stack.push(Scope())
   pass

def p_statements(p):
   '''statements : statement statements
                 | empty'''
   if len(p) >= 3:
      p[0] = Statements(p[1], p[2])
   else:
      p[0] = Node("EMPTY", [])
   pass

# statement is either a compound statement or a simple statement
def p_statement(p):
   '''statement : matched_statement
                | unmatched_statement'''
   #p[0] = Node("STATEMENT", [p[1]])
   p[0] = p[1]
   pass

# Matched Statement - This is used to deal with If ambiguity
def p_matched_statement(p):
   ''' matched_statement : IF LPAREN logical_expression RPAREN matched_statement ELSE matched_statement 
                         | WHILE LPAREN logical_expression RPAREN matched_statement
                         | FOR LPAREN var_declaration SEMI_COLON logical_expression SEMI_COLON attribution RPAREN matched_statement 
                         | return_statement
                         | compound_statement
                         | simple_statement
                         | block'''
   if len(p) == 9:
      #p[0] = Node("FOR-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6], p[7], p[8], p[9]])
      p[0] = ForStatement(p[3], p[5], p[7], p[9])
   elif len(p) == 8:
      #p[0] = Node("IF-ELSE-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6], p[7]])
      p[0] = IfElseStatement(p[3], p[5], p[7])
   elif len(p) == 6:
      #p[0] = Node("WHILE-CLAUSE", [p[1], p[2], p[3], p[4], p[5]])
      p[0] = WhileStatement(p[3], p[5])
   else:
      #p[0] = Node("MATCHED_STATEMENT", [p[1]])
      p[0] = p[1]
   pass

# Unmateched statement - This is used to deal with If ambiguity 
def p_unmatched_statement(p):
   ''' unmatched_statement : IF LPAREN logical_expression RPAREN matched_statement 
                           | IF LPAREN logical_expression RPAREN matched_statement ELSE unmatched_statement '''
   if len(p) == 7:
     #p[0] = Node("IF-ELSE-CLAUSE", [p[1], p[2], p[3], p[4], p[5], p[6]])
      p[0] = IfElseStatement(p[3], p[5], p[6])
   else:
     # p[0] = Node("IF-CLAUSE", [p[1], p[2], p[3], p[4], p[5]])
      p[0] = IfStatement(p[3], p[5])
      #stack.push(Scope())
      #currentScope = stack.peek()
      #newScope = Scope()
      #currentScope.addObj(newScope)
   pass

# Logical expression OR
def p_logical_expression_or(p):
   'logical_expression : logical_term OR logical_term'
   p[0] = Node("LOGICAL_EXPRESSION", [p[1], p[2]], p[3])
   pass

# Logical expression term
def p_logical_expression_term(p):
   'logical_expression : logical_term'
   p[0] = Node("LOGICAL_EXPRESSION_TERM", [p[1]])
   pass

# Logical expression AND
def p_logical_term_and(p):
   'logical_term : logical_factor AND logical_factor'
   #p[0] = Node("LOGICAL_EXPRESSION_AND", [p[1]])
   p[0] = BinaryOp(p[1], p[2], p[3])
   pass

# Logical expression factor
def p_logical_term_factor(p):
   'logical_term : logical_factor'
   #p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1]])
   p[0] = p[1]
   pass

# Logical expression NOT, boolean or another logical expression enclosed by parens
def p_logical_factor(p):
   '''logical_factor : boolean_statement
                     | NOT logical_factor
                     | LPAREN logical_expression RPAREN'''
   if len(p) == 4:
      #p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1], p[2], p[3]])
      p[0] = BinaryOp(p[1], p[2], p[3])
   elif len(p) == 3:
      #p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1], p[2]])
      p[0] = NotExpression(p[2])
   else:
      #p[0] = Node("LOGICAL_EXPRESSION_FACTOR", [p[1]])
      p[0] = p[1]
   pass

# boolean_statement
def p_boolean_statement(p):
   '''boolean_statement : value comparison_op value
                        | value'''
   if len(p) == 4:
      #p[0] = Node("BOOLEAN_STATEMENT", [p[1], p[3]], p[2])
      p[0] = BinaryOp(p[1], p[2], p[3])
   else:
      #p[0] = Node("BOOLEAN_STATEMENT", [p[1]])
      p[0] = p[1]
   pass


def p_value(p):
   ''' value : ID
             | number'''
   #p[0] = Node("ID_OR_NUMBER", [p[1]])
   p[0] = p[1]
   pass

def p_number(p):
   '''number : int
             | float'''
   p[0] = Number(p[1])
   pass

def p_int(p):
   'int : INTCONST'
   p[0] = Const('int', p[1])
   pass

def p_float(p):
   'float : FLOATCONST'
   p[0] = Const('float', p[1])
   pass

def p_comparison_op(p):
   '''comparison_op : LESS_THAN
                    | LESS_EQUAL
                    | GREATER_THAN
                    | GREATER_EQUAL
                    | EQUAL
                    | NOT_EQUAL'''
   #p[0] = Node("COMPARISON_OP", [p[1]])
   p[0] = p[1]
   pass

# Compound statement is a while or for or do while or switch case
def p_compound_statment(p):
   '''compound_statement : var_list
                         | attribution'''
   #p[0] = Node("COMPOUND_STATEMENT", [p[1]])
   p[0] = p[1]
   pass

# A simple statement is a declaration, or 
def p_simple_statement(p):
   '''simple_statement : expression'''
   #p[0] = Node("SIMPLE_STATEMENT", [p[1]])
   p[0] = p[1]
   pass

# Production for atrribution
def p_atrribution(p):
    '''attribution : var_declaration EQUALS expression 
                   | var_declaration EQUALS string
                   | var_declaration EQUALS char
                   | ID EQUALS expression
                   | ID EQUALS STRINGCONST
                   | ID EQUALS CHARCONST'''
    #p[0] = Node("ATTRIBUTION", [p[1], p[3]], p[2])
    #p[0] = BinaryOp(p[1], p[2], p[3])
    p[0] = Attribution(p[1], p[3])
    pass

def p_string(p):
   'string : STRINGCONST'
   p[0] = Const('string', p[1])
   pass

def p_char(p):
   'char : CHARCONST'
   p[0] = Const('char', p[1])
   pass

# Gramatical rules for expressions
def p_expression_plus(p):
    'expression : expression PLUS term'
    #p[0] = Node("binop_plus", [p[1],p[3]], p[2])
    p[0] = BinaryOp(p[1], p[2], p[3])
    pass

def p_expression_minus(p):
    'expression : expression MINUS term'
    #p[0] = Node("BINOP_MINUS", [p[1], p[3]], p[2])
    p[0] = BinaryOp(p[1], p[2], p[3])
    pass

def p_expression_term(p):
   'expression : term'
   #p[0] = Node("BINOP_TERM", [p[1]])
   p[0] = p[1]
   pass

def p_term_times(p):
   'term : term TIMES factor'
   #p[0] = Node("BINOP_TERM_TIMES", [p[1], p[3]], p[2])
   p[0] = BinaryOp(p[1], p[2], p[3])
   pass

def p_term_div(p):
   'term : term DIVIDE factor'
   #p[0] = Node("BINOP_DIVIDE", [p[1], p[3]], p[2])
   p[0] = BinaryOp(p[1], p[2], p[3])
   pass

def p_term_mod(p):
   'term : term MOD factor'
   #p[0] = Node("BINOP_MOD", [p[1], p[3]], p[2])
   p[0] = BinaryOp(p[1], p[2], p[3])
   pass

def p_term_factor(p):
   'term : factor'
   #p[0] = Node("BINOP_FACTOR", [p[1]])
   p[0] = p[1]
   pass

def p_factor_num(p):
   '''factor : number 
             | ID 
             | factor_expr'''
   #p[0] = Node("BINOP_FACTOR", [p[1]])
   p[0] = p[1]
   pass

def p_factor_expr(p):
   'factor_expr : LPAREN expression RPAREN'
   #p[0] = Node("BINOP_EXPR", [p[1], p[3]], p[2])
   p[0] = p[1]
   pass

def p_uminus_expr(p):
   'uminus_expression : MINUS LPAREN expression RPAREN %prec UMINUS'
   pass

# The empty production
def p_empty(p):
   'empty :'
   p[0] = None #Node("EMPTY", [])
   pass

# Return statement1
def p_return_statement(p):
   'return_statement : RETURN expression'
   #p[0] = Node("RETURN_STATEMENT", [p[1], p[2]])
   p[0] = ReturnStatement(p[2])
   pass


# Error rule for syntax errors
def p_error(p):
   if p:
      print("Syntax error at token", p.type)
      print("Syntax error at '%s'" % p.value)
      print("line : '%s'" % p.lineno)
      print("column: '%s'" % p.lexpos)
      #print("Syntax error in input!")
      #parser.errok()
   else:
      print("Syntax error at EOF")



def p_function_definition_error(p):
   '''function_definition : DEFINE data_type ID LPAREN function_definition_args RPAREN error'''
   print("Syntax error in function statement at '%s'" % p)
   #print('Line' + str(p.linespan(1)))

parser = yacc.yacc()
from read_file_into_buffer import readFileIntoBuffer
data = readFileIntoBuffer('test.fpl')
ast = parser.parse(data, tracking=True)
print(ast)

1 个答案:

答案 0 :(得分:0)

假设您的语言是C语言,f 2不是有效的函数调用;必须写成f(2)。但是yiur语法说:

Rule 5     function_call -> ID function_call_args
Rule 6     function_call_args -> expression

当然允许f 2ID后跟expression)。另一方面,该语法不允许f()也不允许f(1,2)

(第三个作品function_call_args -> function_call是多余的,因为你有function_call_args -> expressionexpression -> function_call。冗余制作会产生解析冲突,所以我把它遗漏了。)

您需要的function_callfunction_definition非常相似,具有明确的括号和表达式的可选列表