Python PLY解析器

时间:2011-10-26 21:03:29

标签: python parsing yacc ply

我试图寻找这个问题的答案,但似乎找不到答案。 我正在尝试使用PLY编写一个Python解析器,用于编写语言。我的BNF的简化版本如下所示:

statement-list -> statement ',' statement-list |
                 'print' expr

statement -> ident 'was' 'a' type |
             ident 'became' expr

type -> 'number' | 'letter'

expr -> factor |
       expr '+' factor |
       expr '-' factor

factor -> number | letter | ident

其中数字和字母类似于int和char。

Yacc文档(http://www.dabeaz.com/ply/ply.html#ply_nn23)仅显示简单算术表达式的语法,其中明确了p [0]应该是什么。

def p_expression_plus(p):
   'expression : expression PLUS term'
    p[0] = p[1] + p[3]

我的问题是我如何为我的BNF中的陈述清单做些什么?我有:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'

但我真的不确定下一步该放什么。 非常感谢任何帮助!

2 个答案:

答案 0 :(得分:7)

我不能说PLY解决方案,但这里有一个使用pyparsing。有时候,即使你最终想要使用其他库来实现解析器,一个pyparsing示例也很有用,作为一个快速而肮脏的原型/练习。不幸的是,这个例子大量使用operatorPrecedence方法,它掩盖了很多中缀解析魔法,所以我不知道你能够轻松地翻译它。可以在示例页面(http://pyparsing.wikispaces.com/Examples)上的pyparsing wiki上找到更传统的expr / term / factor解析器示例,标题为fourFn.py

bnf = """
statement-list -> statement ',' statement-list

statement -> ident 'was' 'a' type | 
             ident 'became' expr |
             'print' expr |
             'if' conditional-expr statement

type -> 'number' | 'letter' 

expr -> factor | 
       expr '+' factor | 
       expr '-' factor 

factor -> number | letter | ident 
"""

from pyparsing import (CaselessKeyword, Word, nums, alphas, alphanums, operatorPrecedence, 
    Forward, MatchFirst, opAssoc, oneOf, Group, delimitedList)

PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT = map(
    CaselessKeyword,
    "print was a became number letter if else true false and or not".upper().split())
keyword = MatchFirst([PRINT, WAS, A, BECAME, NUMBER, LETTER, IF, ELSE, TRUE, FALSE, AND, OR, NOT])

typeSpecifier = NUMBER | LETTER

number = Word(nums)
ident = ~keyword + Word(alphas, alphanums+'_')
operand = number | ident

expr = operatorPrecedence(operand,
    [
    ('-', 1, opAssoc.RIGHT),
    (oneOf('* /'), 2, opAssoc.LEFT),
    (oneOf('+ -'), 2, opAssoc.LEFT),
    ])

comparisonExpr = operatorPrecedence(expr,
    [
    ("!", 1, opAssoc.RIGHT),
    (oneOf("< > = <= >= !="), 2, opAssoc.LEFT),
    ])

booleanExpr = operatorPrecedence(TRUE | FALSE | comparisonExpr,
    [
    (NOT, 1, opAssoc.RIGHT),
    (AND, 2, opAssoc.LEFT),
    (OR, 2, opAssoc.LEFT),
    ])

statement = Forward()
printStmt  = PRINT + expr
wasaStmt   = ident + WAS + A + typeSpecifier
becameStmt = ident + BECAME + expr
ifStmt = IF + booleanExpr + statement
statement << Group(printStmt | wasaStmt | becameStmt | ifStmt)

statementList = delimitedList(statement)

tests = """\
    x was a number
    y became 2+5
    print y
    print 100*(5+2)
    print 100*5+2
    if 5 > y print 1000
    if y < 10 y became y+1, print y
    """.splitlines()[:-1]

for t in tests:
    print t.strip()
    for s in statementList.parseString(t).asList():
        print(s)
    print

打印:

x was a number
['x', 'WAS', 'A', 'NUMBER']

y became 2+5
['y', 'BECAME', ['2', '+', '5']]

print y
['PRINT', 'y']

print 100*(5+2)
['PRINT', ['100', '*', ['5', '+', '2']]]

print 100*5+2
['PRINT', [['100', '*', '5'], '+', '2']]

if 5 > y print 1000
['IF', ['5', '>', 'y'], ['PRINT', '1000']]

if y < 10 y became y+1, print y
['IF', ['y', '<', '10'], ['y', 'BECAME', ['y', '+', '1']]
['PRINT', 'y']

我冒昧地添加print作为一种语句,因此它可以出现在程序体中的任何位置。另外,我尝试添加一个IF-THEN语句,这个 显示如何添加这样一个控制流语句开始带你走写编写递归语法的路径(不需要递归只是为了'是'','成为'和'打印')。

答案 1 :(得分:3)

这实际上取决于您如何构建代码以及如何评估代码。如果你正在进行评估,只要它以正确的顺序进行评估,你不希望你在p_statement_list_comma的文档字符串之后可能不需要任何东西,就像你拥有它一样 - 语句将是无论如何都要进行评估,如果需要,您可以保留变量或类似内容的全局字典,以跟踪某些状态,例如标识符值。

如果你想建立一个解析树,例如如果您不喜欢ply的评估顺序,请单独进行评估,您可以这样做:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = [p[1]] + p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = [p[2]]

然后,这将为您提供一个语句列表,列表中的最后一个元素是表达式。这使用列表来简化;如果你愿意,你也可以使用自己的类 - 只需将你想要的任何python对象分配给p [0],它就可用于上面的级别。

如果你想从yacc.parse返回print表达式的结果(从yacc.parse返回解析树顶层的值),你可以这样做:

def p_statement_list_comma(p):
    'statement-list : statement COMMA statement-list'
    p[0] = p[3]

def p_statement_print_expr(p):
    'statement-list : PRINT expr'
    p[0] = p[2]