Question

我正在尝试解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示：

attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)

当然，使用更多嵌套级别和逻辑运算符，过滤可能会复杂得多。这一切归结为：

提取“单位过滤表达式”，例如attribute1 == value1
运行数据集中的每个过滤器
使用交叉点（和）和联合（或）

我重复使用了Paul McGuire已经提供的一些示例，所以我的代码看起来像这样：

import pyparsing

def process_results(result):
    for key in result.keys():
        print(key + ":" + str(result[key]))
        if key == 'complex_filter':
            process_results(result[key])


def parse_filter(filter_string):
    # break these up so we can represent higher precedence for 'and' over 'or'
    not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
    and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
    or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")

    # db_keyword is okay, but you might just want to use a general 'identifier' expression,
    # you won't have to keep updating as you add other terms to your query language
    ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')

    # comparison operators
    comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])

    # instead of generic 'value', define specific value types
    integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
    float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

    # use pyparsing's QuotedString class for this, it gives you quote escaping, and
    # automatically strips quotes from the parsed text
    quote = pyparsing.QuotedString('"')

    # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
    literal_true = pyparsing.Keyword('true', caseless=True)
    literal_false = pyparsing.Keyword('false', caseless=True)
    boolean_literal = literal_true | literal_false

    # in future, you can expand comparison_operand to be its own operatorPrecedence
    # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
    comparison_operand = quote | ident | float_ | integer
    comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )


    grammar = pyparsing.infixNotation(comparison_expr,
        [
        (not_operator, 1, pyparsing.opAssoc.RIGHT),
        (and_operator, 2, pyparsing.opAssoc.LEFT),
        (or_operator,  2, pyparsing.opAssoc.LEFT),
        ]
    ).setResultsName("complex_filter")

    res = grammar.parseString(filter_string, parseAll=True)

    return res

res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')

process_results(res)

输出非常接近我想要的结果：

complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
unit_filter:[['attribute1', '==', 'value1']]
operator:and

正如您所看到的，它不会继续循环“嵌套”结果...我希望输出

complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
unit_filter:[['attribute1', '==', 'value1']]
operator:and
complex_filter: [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]
unit_filter:[['attribute2', '>=', 3]]
operator:or
unit_filter:[['attribute3', '!=', 'value3']]

知道我能做些什么才能到达那里？谢谢！

使用pyparsing解析复杂的过滤器定义

0 个答案: