PEG递归语法

时间:2016-04-17 22:36:32

标签: python parsing grammar peg

我试图写一个Tiger解析器。我最初使用的是PyPEG,但是由于一些困难,我选择了Arpeggio。

我的语法很简单。

def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]

def simple_var(): return id

def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

困难在于let-in-exp表达式。 let in let in let in end end end是有效的老虎。

然而 - 目前 - 琶音并没有按原样识别let-in-exp,而是认识三个simple-var。实际上,进入ZeroOrMore(exp)后,它会消耗end,因此无法为let-in-exp找到它。

如何解决这个问题?

2 个答案:

答案 0 :(得分:3)

不是琶音解决方案,但也许更适合您的口味?

from pyparsing import (CaselessKeyword,Word,nums,QuotedString,alphas,alphanums,
    Forward,Group,Optional,OneOrMore,ZeroOrMore,delimitedList)

LET,IN,END = map(CaselessKeyword, "let in end".split())

number = Word(nums).setName("number")
string = QuotedString('"')
ident = ~(LET | IN | END) + Word(alphas, alphanums+'_')
ident.setName("ident")

literal = number | string

simple_var = ident

exp = Forward().setName("exp")
let_in_exp = Group(LET + IN + ZeroOrMore(exp) + END).setName("let_in_exp")

param = number | string
params = delimitedList(param)
function_call = ident() + '(' + Optional(params) + ')'

exp <<= let_in_exp | simple_var | literal | function_call

code = OneOrMore(exp)

tests = """\
    let in let in let in end end end
    let in let in let in "blah" end end end
    let in let in let in "blah" end 1729 end end
    """
code.runTests(tests)

我设计了pyparsing以允许使用算术运算符组合表达式:

  • + - &gt;
  • | - &gt;匹配第一
  • ^ - &gt;或者(尝试所有,匹配最长)
  • ~ - &gt;不是
  • & - &gt;每个(与And相同,但以任何顺序)
  • * - &gt;多个(如expr*3而不是expr+expr+expr

我相信这些运算符和像OneOrMore这样的普通语言类名称使这些解析器更容易理解,并且可以随着时间的推移而保持。

答案 1 :(得分:2)

正如Paul已经指出的那样,您应该使用Not句法谓词来避免按simple_var规则匹配关键字。另外,我建议不要将ZeroOrMore解析表达式包装在Optional中,因为它已经隐含了。

琶音的解决方案是

from arpeggio import Not, OneOrMore, ZeroOrMore, Optional, EOF, ParserPython
from arpeggio import RegExMatch as _

keyword = ['let', 'in', 'end']   
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]
def simple_var(): return Not(keyword), id
def let_in_exp(): return 'let', 'in', ZeroOrMore(exp), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

parser = ParserPython(code, debug=True)
test = 'let in 42 let in "foo" let in end end end'
parse_tree = parser.parse(test)

这将产生以下解析树

enter image description here