如何在Pyparsing中捕获运算符之前的所有内容

时间:2018-08-03 06:54:37

标签: python pyparsing

引用Pyparsing problem with operators

我正在尝试创建pyparsing语法。我想将空格分隔的实体捕获为运算符“和” /“或”之前的单个单词。

预期结果是:

(United kingdom or Sweden)
['United kingdom','or','Sweden']

我得到的是

['United', 'kingdom','or','Sweden']

到目前为止的代码

from pyparsing import *
import json

QUOTED = quotedString.setParseAction(removeQuotes)
OAND = CaselessLiteral("and")
OOR = CaselessLiteral("or")
ONOT = CaselessLiteral("not")
WORDWITHSPACE = Combine(OneOrMore(Word(printables.replace("(", "").replace(")", "")) | White(
    ' ') + ~(White() | OAND | ONOT | OOR)))
TERM = (QUOTED | WORDWITHSPACE)
EXPRESSION = operatorPrecedence(TERM,
                                [
                                    (ONOT, 1, opAssoc.RIGHT),
                                    (OAND, 2, opAssoc.LEFT),
                                    (OOR, 2, opAssoc.LEFT)
                                ])

STRING = OneOrMore(EXPRESSION) + StringEnd()

1 个答案:

答案 0 :(得分:1)

我重新定义WORDWITHSPACE,如下所示:

# space-separated words are easiest to define using just OneOrMore
# must use a negative lookahead for and/not/or operators, and this must come
# at the beginning of the expression
WORDWITHSPACE = OneOrMore(~(OAND | ONOT | OOR) + Word(printables, excludeChars="()"))

# use a parse action to recombine words into a single string
WORDWITHSPACE.addParseAction(' '.join)

对您的代码示例进行了这些更改之后,我得以编写:

tests = """
    # basic test
    United Kingdom or Sweden

    # multiple operators at the same precedence level
    United Kingdom or Sweden or France

    # implicit grouping by precedence - 'and' is higher prec than 'or
    United Kingdom or Sweden and People's Republic of China

    # use ()'s to override precedence of 'and' over 'or
    (United Kingdom or Sweden) and People's Republic of China
    """

EXPRESSION.runTests(tests, fullDump=False)

并获得

# basic test
United Kingdom or Sweden
[['United Kingdom', 'or', 'Sweden']]

# multiple operators at the same precedence level
United Kingdom or Sweden or France
[['United Kingdom', 'or', 'Sweden', 'or', 'France']]

# implicit grouping by precedence - 'and' is higher prec than 'or
United Kingdom or Sweden and People's Republic of China
[['United Kingdom', 'or', ['Sweden', 'and', "People's Republic of China"]]]

# use ()'s to override precedence of 'and' over 'or
(United Kingdom or Sweden) and People's Republic of China
[[['United Kingdom', 'or', 'Sweden'], 'and', "People's Republic of China"]]