帮我找到一个合适的ruby / python解析器生成器

时间:2009-06-04 19:34:33

标签: python ruby debugging parser-generator

我使用的第一个解析器生成器是Parse :: RecDescent,它可用的指南/教程很棒,但它最有用的功能是它的调试工具,特别是跟踪功能(通过将$ RD_TRACE设置为1激活)。我正在寻找一个解析器生成器,可以帮助您调试它的规则。

问题是,它必须用python或ruby编写,并且具有详细的模式/跟踪模式或非常有用的调试技术。

有谁知道这样的解析器生成器?

编辑:当我说调试时,我没有提到调试python或ruby。我指的是调试解析器生成器,看看它在每一步中做了什么,看看它正在读取的每个字符,规则它正在尝试匹配。希望你明白这一点。

BOUNTY EDIT:为了赢得赏金,请展示一个解析器生成器框架,并说明它的一些调试功能。我再说一遍,我对pdb不感兴趣,但是在解析器的调试框架中。另外,请不要提及树梢。我对它不感兴趣。

5 个答案:

答案 0 :(得分:6)

Python是一种非常容易调试的语言。你可以直接导入pdb pdb.settrace()。

但是,这些解析器生成器应该具有良好的调试功能。

http://www.antlr.org/

http://www.dabeaz.com/ply/

http://pyparsing.wikispaces.com/

回应赏金

这是PLY调试的实际操作。

源代码

tokens = (
    'NAME','NUMBER',
    )

literals = ['=','+','-','*','/', '(',')']

# Tokens

t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

t_ignore = " \t"

def t_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# Build the lexer
import ply.lex as lex
lex.lex(debug=1)

# Parsing rules

precedence = (
    ('left','+','-'),
    ('left','*','/'),
    ('right','UMINUS'),
    )

# dictionary of names
names = { }

def p_statement_assign(p):
    'statement : NAME "=" expression'
    names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'
    print(p[1])

def p_expression_binop(p):
    '''expression : expression '+' expression
                  | expression '-' expression
                  | expression '*' expression
                  | expression '/' expression'''
    if p[2] == '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
    "expression : '-' expression %prec UMINUS"
    p[0] = -p[2]

def p_expression_group(p):
    "expression : '(' expression ')'"
    p[0] = p[2]

def p_expression_number(p):
    "expression : NUMBER"
    p[0] = p[1]

def p_expression_name(p):
    "expression : NAME"
    try:
        p[0] = names[p[1]]
    except LookupError:
        print("Undefined name '%s'" % p[1])
        p[0] = 0

def p_error(p):
    if p:
        print("Syntax error at '%s'" % p.value)
    else:
        print("Syntax error at EOF")

import ply.yacc as yacc
yacc.yacc()

import logging
logging.basicConfig(
    level=logging.INFO,
    filename="parselog.txt"
)

while 1:
    try:
        s = raw_input('calc > ')
    except EOFError:
        break
    if not s: continue
    yacc.parse(s, debug=1)

输出

lex: tokens   = ('NAME', 'NUMBER')
lex: literals = ['=', '+', '-', '*', '/', '(', ')']
lex: states   = {'INITIAL': 'inclusive'}
lex: Adding rule t_NUMBER -> '\d+' (state 'INITIAL')
lex: Adding rule t_newline -> '\n+' (state 'INITIAL')
lex: Adding rule t_NAME -> '[a-zA-Z_][a-zA-Z0-9_]*' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_NUMBER>\d+)|(?P<t_newline>\n+)|(?P<t_NAME>[a-zA-Z
_][a-zA-Z0-9_]*)'
calc > 2+3
PLY: PARSE DEBUG START

State  : 0
Stack  : . LexToken(NUMBER,2,1,0)
Action : Shift and goto state 3

State  : 3
Stack  : NUMBER . LexToken(+,'+',1,1)
Action : Reduce rule [expression -> NUMBER] with [2] and goto state 9
Result : <int @ 0x1a1896c> (2)

State  : 6
Stack  : expression . LexToken(+,'+',1,1)
Action : Shift and goto state 12

State  : 12
Stack  : expression + . LexToken(NUMBER,3,1,2)
Action : Shift and goto state 3

State  : 3
Stack  : expression + NUMBER . $end
Action : Reduce rule [expression -> NUMBER] with [3] and goto state 9
Result : <int @ 0x1a18960> (3)

State  : 18
Stack  : expression + expression . $end
Action : Reduce rule [expression -> expression + expression] with [2,'+',3] and goto state
 3
Result : <int @ 0x1a18948> (5)

State  : 6
Stack  : expression . $end
Action : Reduce rule [statement -> expression] with [5] and goto state 2
5
Result : <NoneType @ 0x1e1ccef4> (None)

State  : 4
Stack  : statement . $end
Done   : Returning <NoneType @ 0x1e1ccef4> (None)
PLY: PARSE DEBUG END
calc >

在parser.out

生成的解析表
Created by PLY version 3.2 (http://www.dabeaz.com/ply)

Grammar

Rule 0     S' -> statement
Rule 1     statement -> NAME = expression
Rule 2     statement -> expression
Rule 3     expression -> expression + expression
Rule 4     expression -> expression - expression
Rule 5     expression -> expression * expression
Rule 6     expression -> expression / expression
Rule 7     expression -> - expression
Rule 8     expression -> ( expression )
Rule 9     expression -> NUMBER
Rule 10    expression -> NAME

Terminals, with rules where they appear

(                    : 8
)                    : 8
*                    : 5
+                    : 3
-                    : 4 7
/                    : 6
=                    : 1
NAME                 : 1 10
NUMBER               : 9
error                : 

Nonterminals, with rules where they appear

expression           : 1 2 3 3 4 4 5 5 6 6 7 8
statement            : 0

Parsing method: LALR

state 0

    (0) S' -> . statement
    (1) statement -> . NAME = expression
    (2) statement -> . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    NAME            shift and go to state 1
    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3

    expression                     shift and go to state 6
    statement                      shift and go to state 4

state 1

    (1) statement -> NAME . = expression
    (10) expression -> NAME .

    =               shift and go to state 7
    +               reduce using rule 10 (expression -> NAME .)
    -               reduce using rule 10 (expression -> NAME .)
    *               reduce using rule 10 (expression -> NAME .)
    /               reduce using rule 10 (expression -> NAME .)
    $end            reduce using rule 10 (expression -> NAME .)


state 2

    (7) expression -> - . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 9

state 3

    (9) expression -> NUMBER .

    +               reduce using rule 9 (expression -> NUMBER .)
    -               reduce using rule 9 (expression -> NUMBER .)
    *               reduce using rule 9 (expression -> NUMBER .)
    /               reduce using rule 9 (expression -> NUMBER .)
    $end            reduce using rule 9 (expression -> NUMBER .)
    )               reduce using rule 9 (expression -> NUMBER .)


state 4

    (0) S' -> statement .



state 5

    (8) expression -> ( . expression )
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 10

state 6

    (2) statement -> expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    $end            reduce using rule 2 (statement -> expression .)
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 7

    (1) statement -> NAME = . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 15

state 8

    (10) expression -> NAME .

    +               reduce using rule 10 (expression -> NAME .)
    -               reduce using rule 10 (expression -> NAME .)
    *               reduce using rule 10 (expression -> NAME .)
    /               reduce using rule 10 (expression -> NAME .)
    $end            reduce using rule 10 (expression -> NAME .)
    )               reduce using rule 10 (expression -> NAME .)


state 9

    (7) expression -> - expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 7 (expression -> - expression .)
    -               reduce using rule 7 (expression -> - expression .)
    *               reduce using rule 7 (expression -> - expression .)
    /               reduce using rule 7 (expression -> - expression .)
    $end            reduce using rule 7 (expression -> - expression .)
    )               reduce using rule 7 (expression -> - expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]


state 10

    (8) expression -> ( expression . )
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    )               shift and go to state 16
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 11

    (4) expression -> expression - . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 17

state 12

    (3) expression -> expression + . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 18

state 13

    (5) expression -> expression * . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 19

state 14

    (6) expression -> expression / . expression
    (3) expression -> . expression + expression
    (4) expression -> . expression - expression
    (5) expression -> . expression * expression
    (6) expression -> . expression / expression
    (7) expression -> . - expression
    (8) expression -> . ( expression )
    (9) expression -> . NUMBER
    (10) expression -> . NAME

    -               shift and go to state 2
    (               shift and go to state 5
    NUMBER          shift and go to state 3
    NAME            shift and go to state 8

    expression                     shift and go to state 20

state 15

    (1) statement -> NAME = expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    $end            reduce using rule 1 (statement -> NAME = expression .)
    +               shift and go to state 12
    -               shift and go to state 11
    *               shift and go to state 13
    /               shift and go to state 14


state 16

    (8) expression -> ( expression ) .

    +               reduce using rule 8 (expression -> ( expression ) .)
    -               reduce using rule 8 (expression -> ( expression ) .)
    *               reduce using rule 8 (expression -> ( expression ) .)
    /               reduce using rule 8 (expression -> ( expression ) .)
    $end            reduce using rule 8 (expression -> ( expression ) .)
    )               reduce using rule 8 (expression -> ( expression ) .)


state 17

    (4) expression -> expression - expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 4 (expression -> expression - expression .)
    -               reduce using rule 4 (expression -> expression - expression .)
    $end            reduce using rule 4 (expression -> expression - expression .)
    )               reduce using rule 4 (expression -> expression - expression .)
    *               shift and go to state 13
    /               shift and go to state 14

  ! *               [ reduce using rule 4 (expression -> expression - expression .) ]
  ! /               [ reduce using rule 4 (expression -> expression - expression .) ]
  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]


state 18

    (3) expression -> expression + expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 3 (expression -> expression + expression .)
    -               reduce using rule 3 (expression -> expression + expression .)
    $end            reduce using rule 3 (expression -> expression + expression .)
    )               reduce using rule 3 (expression -> expression + expression .)
    *               shift and go to state 13
    /               shift and go to state 14

  ! *               [ reduce using rule 3 (expression -> expression + expression .) ]
  ! /               [ reduce using rule 3 (expression -> expression + expression .) ]
  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]


state 19

    (5) expression -> expression * expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 5 (expression -> expression * expression .)
    -               reduce using rule 5 (expression -> expression * expression .)
    *               reduce using rule 5 (expression -> expression * expression .)
    /               reduce using rule 5 (expression -> expression * expression .)
    $end            reduce using rule 5 (expression -> expression * expression .)
    )               reduce using rule 5 (expression -> expression * expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]


state 20

    (6) expression -> expression / expression .
    (3) expression -> expression . + expression
    (4) expression -> expression . - expression
    (5) expression -> expression . * expression
    (6) expression -> expression . / expression

    +               reduce using rule 6 (expression -> expression / expression .)
    -               reduce using rule 6 (expression -> expression / expression .)
    *               reduce using rule 6 (expression -> expression / expression .)
    /               reduce using rule 6 (expression -> expression / expression .)
    $end            reduce using rule 6 (expression -> expression / expression .)
    )               reduce using rule 6 (expression -> expression / expression .)

  ! +               [ shift and go to state 12 ]
  ! -               [ shift and go to state 11 ]
  ! *               [ shift and go to state 13 ]
  ! /               [ shift and go to state 14 ]

答案 1 :(得分:2)

我对它的调试功能一无所知,但我听说过有关PyParsing的好消息。

http://pyparsing.wikispaces.com/

答案 2 :(得分:2)

我知道已经声明了赏金,但这里是一个用pyparsing编写的等效解析器(加上支持带有零个或多个逗号分隔参数的函数调用):

from pyparsing import *

LPAR, RPAR = map(Suppress,"()")
EQ = Literal("=")
name = Word(alphas, alphanums+"_").setName("name")
number = Word(nums).setName("number")

expr = Forward()
operand = Optional('-') + (Group(name + LPAR + 
                                  Group(Optional(delimitedList(expr))) +
                                  RPAR) |
                           name | 
                           number | 
                           Group(LPAR + expr + RPAR))
binop = oneOf("+ - * / **")
expr << (Group(operand + OneOrMore(binop + operand)) | operand)

assignment = name + EQ + expr
statement = assignment | expr

此测试代码通过其基本步骤运行解析器:

tests = """\
    sin(pi/2)
    y = mx+b
    E = mc ** 2
    F = m*a
    x = x0 + v*t +a*t*t/2
    1 - sqrt(sin(t)**2 + cos(t)**2)""".splitlines()

for t in tests:
    print t.strip()
    print statement.parseString(t).asList()
    print

给出这个输出:

sin(pi/2)
[['sin', [['pi', '/', '2']]]]

y = mx+b
['y', '=', ['mx', '+', 'b']]

E = mc ** 2
['E', '=', ['mc', '**', '2']]

F = m*a
['F', '=', ['m', '*', 'a']]

x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]

1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]

为了进行调试,我们添加以下代码:

# enable debugging for name and number expressions
name.setDebug()
number.setDebug()

现在我们重新解析第一个测试(显示输入字符串和一个简单的列标尺):

t = tests[0]
print ("1234567890"*10)[:len(t)]
print t
statement.parseString(t)
print

给出这个输出:

1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']

Pyparsing还支持packrat解析,一种解析时的memoization(详细了解packratting here)。这是相同的解析序列,但启用了packrat:

same parse, but with packrat parsing enabled
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']

这是一个有趣的练习,有助于我看到其他解析器库的调试功能。

答案 3 :(得分:1)

上面的ANTLR具有生成人类可读且易于理解的代码的优势, 因为它是(一个非常复杂和强大的)自上而下的解析器, 所以你可以使用常规调试器逐步完成它 并看看真正正在做什么。

这就是为什么它是我选择的解析器生成器。

像PLY这样自下而上的解析器生成器具有劣势 对于较大的语法,几乎不可能理解 调试输出真正意味着什么以及为什么 解析表就像它一样。

答案 4 :(得分:0)

Python wiki有一个list语言分析器用Python编写。