使用pyparsing解析复杂的过滤器定义

时间:2017-02-01 23:29:34

标签: python pyparsing

我正在尝试解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示:

attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)

当然,使用更多嵌套级别和逻辑运算符,过滤可能会复杂得多。这一切归结为:

  1. 提取“单位过滤表达式”,例如attribute1 == value1
  2. 运行数据集中的每个过滤器
  3. 使用交叉点(和)和联合(或)
  4. 组合结果

    我重复使用了Paul McGuire已经提供的一些示例,所以我的代码看起来像这样:

    import pyparsing
    
    def process_results(result):
        for key in result.keys():
            print(key + ":" + str(result[key]))
            if key == 'complex_filter':
                process_results(result[key])
    
    
    def parse_filter(filter_string):
        # break these up so we can represent higher precedence for 'and' over 'or'
        not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
        and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
        or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")
    
        # db_keyword is okay, but you might just want to use a general 'identifier' expression,
        # you won't have to keep updating as you add other terms to your query language
        ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')
    
        # comparison operators
        comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])
    
        # instead of generic 'value', define specific value types
        integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
        float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
    
        # use pyparsing's QuotedString class for this, it gives you quote escaping, and
        # automatically strips quotes from the parsed text
        quote = pyparsing.QuotedString('"')
    
        # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
        literal_true = pyparsing.Keyword('true', caseless=True)
        literal_false = pyparsing.Keyword('false', caseless=True)
        boolean_literal = literal_true | literal_false
    
        # in future, you can expand comparison_operand to be its own operatorPrecedence
        # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
        comparison_operand = quote | ident | float_ | integer
        comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )
    
    
        grammar = pyparsing.infixNotation(comparison_expr,
            [
            (not_operator, 1, pyparsing.opAssoc.RIGHT),
            (and_operator, 2, pyparsing.opAssoc.LEFT),
            (or_operator,  2, pyparsing.opAssoc.LEFT),
            ]
        ).setResultsName("complex_filter")
    
        res = grammar.parseString(filter_string, parseAll=True)
    
        return res
    
    res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')
    
    process_results(res)
    

    输出非常接近我想要的结果:

    complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
    unit_filter:[['attribute1', '==', 'value1']]
    operator:and
    

    正如您所看到的,它不会继续循环“嵌套”结果...我希望输出

    complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
    unit_filter:[['attribute1', '==', 'value1']]
    operator:and
    complex_filter: [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]
    unit_filter:[['attribute2', '>=', 3]]
    operator:or
    unit_filter:[['attribute3', '!=', 'value3']]
    

    知道我能做些什么才能到达那里?谢谢!

0 个答案:

没有答案