ParserElement.enablePackrat()不会使infixNotation更快

时间:2016-06-20 15:03:33

标签: pyparsing

我在Python 3.5.0上使用pyparsing(2.1.5)。 我想让infixNotation更快。我发现其他人使用ParserElement.enablePackrat()来改善infixNotation性能。但我无法做到。我的代码如下。

from pyparsing import *
ParserElement.enablePackrat()
UNICODE_CHARS = u''.join(
    chr(c) for c in range(65538) if not chr(c).isspace() and
    chr(c) not in '()"'
)
_and_ = Keyword('AND')
_or_ = Keyword('OR')
_not_ = Keyword('NOT')
search_term = ~_and_ + ~_or_ + ~_not_ + Word(UNICODE_CHARS) | QuotedString(
    '"', endQuoteChar='"', unquoteResults=False
)
search_expr = infixNotation(
    search_term,
    [
        (_not_, 1, opAssoc.RIGHT),
        (Optional(_and_), 2, opAssoc.LEFT), 
        (_or_, 2, opAssoc.LEFT),
    ]
)
parsed_query = search_expr.parseString(user_string)[0].asList()

1 个答案:

答案 0 :(得分:0)

infixNotation的这种使用只有3级操作符,所以打包对你没什么用。这些改进通常包含5个或更多级别的运算符,例如算术运算。

如果你真的试图摆脱infixNotation的性能,那就写下你自己的精简版:

"""
BNF

operand = ~and ~or ~not (A-Za-z0-9)... | quoted_string

atom = 'not'? (operand | '(' expr ')')
and_term = atom 'and' atom
or_term = and_term 'or' and_term
"""


_and_ = CaselessKeyword('AND')
_or_ = CaselessKeyword('OR')
_not_ = CaselessKeyword('NOT')
keyword = (_and_ | _or_ | _not_)
search_term = ~keyword + Word(UNICODE_CHARS) | QuotedString('"', endQuoteChar='"', unquoteResults=False)

# use this instead of infixNotation - this is essentially what infixNotation will
# generate, but with fewer FollowedBy's (used to collapse degenerate levels)
LPAR,RPAR = map(Suppress, "()")
expr = Forward()
atom_ = search_term | Group(LPAR + expr + RPAR)
atom = Group(_not_ + atom_) | atom_
and_term = Group(atom + ZeroOrMore(_and_ + atom))
or_term = Group(and_term + ZeroOrMore(_or_ + and_term))
expr <<= or_term

# some simple test cases
tests = """
    p and not q
    p and not q or r
    p and not (q or r)
"""

print("compare with using infixNotation")
expr.runTests(tests)

print("compare with using infixNotation")
search_expr = infixNotation(
    search_term,
    [
        (_not_, 1, opAssoc.RIGHT),
        (Optional(_and_), 2, opAssoc.LEFT), 
        (_or_, 2, opAssoc.LEFT),
    ]
)

search_expr.runTests(tests)

硬编码版本将提供如下输出:

p and not q
[[['p', 'AND', ['NOT', 'q']]]]
[0]:
  [['p', 'AND', ['NOT', 'q']]]
  [0]:
    ['p', 'AND', ['NOT', 'q']]
    [0]:
      p
    [1]:
      AND
    [2]:
      ['NOT', 'q']


p and not q or r
[[['p', 'AND', ['NOT', 'q']], 'OR', ['r']]]
[0]:
  [['p', 'AND', ['NOT', 'q']], 'OR', ['r']]
  [0]:
    ['p', 'AND', ['NOT', 'q']]
    [0]:
      p
    [1]:
      AND
    [2]:
      ['NOT', 'q']
  [1]:
    OR
  [2]:
    ['r']


p and not (q or r)
[[['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]]]
[0]:
  [['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]]
  [0]:
    ['p', 'AND', ['NOT', [[['q'], 'OR', ['r']]]]]
    [0]:
      p
    [1]:
      AND
    [2]:
      ['NOT', [[['q'], 'OR', ['r']]]]
      [0]:
        NOT
      [1]:
        [[['q'], 'OR', ['r']]]
        [0]:
          [['q'], 'OR', ['r']]
          [0]:
            ['q']
          [1]:
            OR
          [2]:
            ['r']

使用infixNotation会给出:

p and not q
[['p', 'AND', ['NOT', 'q']]]
[0]:
  ['p', 'AND', ['NOT', 'q']]
  [0]:
    p
  [1]:
    AND
  [2]:
    ['NOT', 'q']


p and not q or r
[[['p', 'AND', ['NOT', 'q']], 'OR', 'r']]
[0]:
  [['p', 'AND', ['NOT', 'q']], 'OR', 'r']
  [0]:
    ['p', 'AND', ['NOT', 'q']]
    [0]:
      p
    [1]:
      AND
    [2]:
      ['NOT', 'q']
  [1]:
    OR
  [2]:
    r


p and not (q or r)
[['p', 'AND', ['NOT', ['q', 'OR', 'r']]]]
[0]:
  ['p', 'AND', ['NOT', ['q', 'OR', 'r']]]
  [0]:
    p
  [1]:
    AND
  [2]:
    ['NOT', ['q', 'OR', 'r']]
    [0]:
      NOT
    [1]:
      ['q', 'OR', 'r']

infixNotation添加的FollowedBy术语通过确保在实际分组之前有2个或更多个术语被分组来折叠简并级别。它们还会阻止对操作优先级定义的每个级别的原子进行调用解析操作。

如果这些对您无关紧要,请尝试精简版。

(另外,请对你的UNICODE_CHARS定义做一点时间 - 生成这个字符串会有点费时。你可能想要将这个字符串预先生成一个单独的模块并导入它。)