我刚刚了解了优秀的pyparsing模块,我想用它来创建一个查询解析器。
基本上我希望能够解析以下类型的表达式:
'b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)'
其中b_coherent,symbol和nucleon是数据库的关键字。
我仔细阅读了pyparsing(searchparser.py)附带的一个示例,我认为(我希望!),让我非常接近我的目标,但仍然有问题。
这是我的代码:
from pyparsing import *
logical_operator = oneOf(['and','&','or','|'], caseless=True)
not_operator = oneOf(['not','^'], caseless=True)
db_keyword = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
value = Word(alphanums+'_')
quote = Combine('"' + value + '"') | value
selection = db_keyword + arithmetic_operator + (value|quote)
selection = selection + ZeroOrMore(logical_operator+selection)
parenthesis = Forward()
parenthesis << ((selection + parenthesis) | selection)
parenthesis = Combine('(' + parenthesis + ')') | selection
grammar = parenthesis + lineEnd
res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)')
我有一些问题要完全理解Forward对象。也许这是我的解析器无法正常工作的一个原因。你知道我的语法有什么问题吗?
非常感谢你的帮助
埃里克
答案 0 :(得分:1)
你可以使用Forward在括号内手工制作自己的表达式,但是pyparsing的operatorPrecedence
简化了整个过程。请参阅下面我原始代码的更新形式,并附上评论:
from pyparsing import *
# break these up so we can represent higher precedence for 'and' over 'or'
#~ logical_operator = oneOf(['and','&','or','|'], caseless=True)
not_operator = oneOf(['not','^'], caseless=True)
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# db_keyword is okay, but you might just want to use a general 'identifier' expression,
# you won't have to keep updating as you add other terms to your query language
db_keyword = oneOf(['nucleon','b_coherent','symbol','mass'], caseless=True)
ident = Word(alphas+'_', alphanums+'_')
# these aren't really arithmetic operators, they are comparison operators
#~ arithmetic_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
# instead of generic 'value', define specific value types
#~ value = Word(alphanums+'_')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
# use pyparsing's QuotedString class for this, it gives you quote escaping, and
# automatically strips quotes from the parsed text
#~ quote = Combine('"' + value + '"') | value
quote = QuotedString('"')
# when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
literal_true = Keyword('true', caseless=True)
literal_false = Keyword('false', caseless=True)
boolean_literal = literal_true | literal_false
# in future, you can expand comparison_operand to be its own operatorPrecedence
# term, so that you can do things like "nucleon != 1+2" - but this is fine for now
comparison_operand = quote | db_keyword | ident | float_ | integer
comparison_expr = Group(comparison_operand + comparison_operator + comparison_operand)
# all this business is taken of for you by operatorPrecedence
#~ selection = db_keyword + arithmetic_operator + (value|quote)
#~ selection = selection + ZeroOrMore(logical_operator+selection)
#~ parenthesis = Forward()
#~ parenthesis << ((selection + parenthesis) | selection)
#~ parenthesis = Combine('(' + parenthesis + ')') | selection
#~ grammar = parenthesis + lineEnd
boolean_expr = operatorPrecedence(comparison_expr | boolean_literal,
[
(not_operator, 1, opAssoc.RIGHT),
(and_operator, 2, opAssoc.LEFT),
(or_operator, 2, opAssoc.LEFT),
])
grammar = boolean_expr
res = grammar.parseString('b_Coherent == "1_2" or (symbol == 2 and nucleon != 3)', parseAll=True)
print res.asList()
打印
[[['b_coherent', '==', '1_2'], 'or', [['symbol', '==', 2], 'and', ['nucleon', '!=', 3]]]]
从这里,我建议您研究如何创建可以实际评估的内容的下一步,查看simpleBool.py example中的pyparsing wiki,了解使用{{1}时如何完成此操作}}
我很高兴听到你正在享受pyparsing,欢迎!
答案 1 :(得分:0)
后面定义的表达式的前向声明 - 用于 递归语法,例如代数中缀表示法。当。。。的时候 表达式是已知的,它被赋值给Forward变量使用 '&LT;&LT;'操作