我试图写一个Tiger解析器。我最初使用的是PyPEG,但是由于一些困难,我选择了Arpeggio。
我的语法很简单。
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')
def literal(): return [number, string]
def simple_var(): return id
def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'
param = [number, string]
params = Optional(param, ZeroOrMore(',', param))
def function_call(): return id, '(', params, ')'
exp = [let_in_exp, simple_var, literal, function_call]
def code(): return OneOrMore(exp), EOF
困难在于let-in-exp
表达式。
let in let in let in end end end
是有效的老虎。
然而 - 目前 - 琶音并没有按原样识别let-in-exp
,而是认识三个simple-var
。实际上,进入ZeroOrMore(exp)后,它会消耗end
,因此无法为let-in-exp
找到它。
如何解决这个问题?
答案 0 :(得分:3)
不是琶音解决方案,但也许更适合您的口味?
from pyparsing import (CaselessKeyword,Word,nums,QuotedString,alphas,alphanums,
Forward,Group,Optional,OneOrMore,ZeroOrMore,delimitedList)
LET,IN,END = map(CaselessKeyword, "let in end".split())
number = Word(nums).setName("number")
string = QuotedString('"')
ident = ~(LET | IN | END) + Word(alphas, alphanums+'_')
ident.setName("ident")
literal = number | string
simple_var = ident
exp = Forward().setName("exp")
let_in_exp = Group(LET + IN + ZeroOrMore(exp) + END).setName("let_in_exp")
param = number | string
params = delimitedList(param)
function_call = ident() + '(' + Optional(params) + ')'
exp <<= let_in_exp | simple_var | literal | function_call
code = OneOrMore(exp)
tests = """\
let in let in let in end end end
let in let in let in "blah" end end end
let in let in let in "blah" end 1729 end end
"""
code.runTests(tests)
我设计了pyparsing以允许使用算术运算符组合表达式:
+
- &gt; |
- &gt;匹配第一^
- &gt;或者(尝试所有,匹配最长)~
- &gt;不是&
- &gt;每个(与And相同,但以任何顺序)*
- &gt;多个(如expr*3
而不是expr+expr+expr
)我相信这些运算符和像OneOrMore
这样的普通语言类名称使这些解析器更容易理解,并且可以随着时间的推移而保持。
答案 1 :(得分:2)
正如Paul已经指出的那样,您应该使用Not
句法谓词来避免按simple_var
规则匹配关键字。另外,我建议不要将ZeroOrMore
解析表达式包装在Optional
中,因为它已经隐含了。
琶音的解决方案是
from arpeggio import Not, OneOrMore, ZeroOrMore, Optional, EOF, ParserPython
from arpeggio import RegExMatch as _
keyword = ['let', 'in', 'end']
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')
def literal(): return [number, string]
def simple_var(): return Not(keyword), id
def let_in_exp(): return 'let', 'in', ZeroOrMore(exp), 'end'
param = [number, string]
params = Optional(param, ZeroOrMore(',', param))
def function_call(): return id, '(', params, ')'
exp = [let_in_exp, simple_var, literal, function_call]
def code(): return OneOrMore(exp), EOF
parser = ParserPython(code, debug=True)
test = 'let in 42 let in "foo" let in end end end'
parse_tree = parser.parse(test)
这将产生以下解析树