我很难弄清楚我的代码是否存在错误,或者是否Ply无法识别某些词汇错误。 我正在为我的编译器编写一个Python的小编译器'在大学上课时,我使用lib Ply编写了一些词法分析器。在扫描部分,我有以下标识符和数字规则:
# Rule for identificator
def t_ID(t):
r'[a-zA-Z_][a-zA-Z0-9_]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
# Regular expression rules for simple tokens
t_NUMBER = r'\d+'
数字规则只是暂时的,我将添加一些更具体的东西来处理浮点和整数。 问题是,在测试扫描时,如果输入类似012abc的内容,它会返回给我:
LexToken(NUMBER,' 123',1,0) LexToken(ID,' ABC',1,3)
不应该抛出错误信息吗?
以下是完整的代码。
import ply.lex as lex
import ply.yacc as yacc
# Reserved words
reserved = {
'if' : 'IF',
'else' : 'ELSE',
'while' : 'WHILE',
'for' : 'FOR',
'switch' : 'SWITCH',
'case' : 'CASE',
'class' : 'CLASS',
'define' : 'DEFINE',
'int' : 'INT',
'float' : 'FLOAT',
'string' : 'STRING',
'void' : 'VOID',
'equal' : 'EQUAL',
'and' : 'AND',
'or' : 'OR',
'not' : 'NOT',
'do' : 'DO',
}
# The tokens declaration is made here.
tokens = [
# Literals (identifier, integer constant, float constant, string constant,
# char const)
# TODO add constants' Types
'ID',
'NUMBER',
# Operators +,-,*,/,%
'PLUS',
'MINUS',
'TIMES',
'DIVIDE',
'MOD',
# Logical Operators
'LESS_THAN',
'LESS_EQUAL',
'GREATER_THAN',
'GREATER_EQUAL',
'NOT_EQUAL',
# Delimeters such as (),{},[],:
'LPAREN',
'RPAREN',
'LBRACE',
'RBRACE',
'COLON',
#Assignment Operators
'EQUALS'
]
tokens += list(reserved.values())
# Regular expression rules for simple tokens
t_NUMBER = r'\d+'
# Operators
t_PLUS = r'\+'
t_MINUS = r'-'
t_TIMES = r'\*'
t_DIVIDE = r'/'
t_MOD = r'\%'
# Logical Operators
t_LESS_THAN = r'<'
t_LESS_EQUAL = r'<='
t_GREATER_THAN = r'>'
t_GREATER_EQUAL = r'>='
t_NOT_EQUAL = r'!='
# Delimiters
t_LPAREN = r'\('
t_RPAREN = r'\)'
t_COLON = r'\:'
t_LBRACE = r'\{'
t_RBRACE = r'\}'
# Assignment
t_EQUALS = r'='
# Rule for identificator
def t_ID(t):
r'[a-zA-Z_][a-zA-Z0-9_]*'
t.type = reserved.get(t.value,'ID') # Check for reserved words
return t
# Define a rule so we can track line numbers
def t_newline(t):
r'\n+'
t.lexer.lineno += len(t.value)
# Compute column.
# input is the input text string
# token is a token instance
def find_column(input, token):
line_start = input.rfind('\n', 0, token.lexpos) + 1
return (token.lexpos - line_start) + 1
# A string containing ignored characters (spaces and tabs)
t_ignore = ' \t'
# Error handling rule
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
# One line comments Python alike
def t_comment(t):
r"[ ]*\043[^\n]*" # \043 is '#'
pass
# Build the lexer
lexer = lex.lex()
# read input for test purposes
#from read_file_into_buffer import readFileIntoBuffer
#data = readFileIntoBuffer('test.fpl')
data = input()
# feed lexer with input
lexer.input(data)
# TODO - Create a function to open a file from stdinput
# The file shall be passed as argument
# Tokenize
while True:
tok = lexer.token()
if not tok:
break # No more input
print(tok)