解析两个讨厌的模式有两个问题。这是一些无意义的例子:
examples = [
"",
"red green",
"#1# red green",
"#1# red green <2>",
"#1,2# red green <2,3>",
"red green ()",
"#1# red green (blue)",
"#1# red green (#5# blue) <2>",
"#1# red green (#5# blue <6>) <2>",
"#1,2# red green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>",
"#1,2# red (maroon) green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>",
]
这时我应该说我无法控制这些字符串的创建。
如您所见,基本上我想解析的每个模式都是可选的。接下来是我要介绍的不同部分。我将这些示例的结构视为:
[cars] [colors] [comments] [buyers]
其中comments
由一个子结构组成,并且可以是用分号分隔的倍数。
comments: ([cars] [colors] [buyers]; ...)
为了获取内容,我创建了以下语法:
import pyparsing as pp
integer = pp.pyparsing_common.integer
car_ref = "#" + pp.Group(pp.delimitedList(integer))("cars") + "#"
buyer_ref = "<" + pp.Group(pp.delimitedList(integer))("buyers") + ">"
我的问题是:
colors
而非comments
的一部分?;
作为分隔符并将其分解。但是,我没有执行该策略。我试过的是:sub_comment = (
pp.Optional(car_ref) +
pp.Group(pp.ZeroOrMore(pp.Regex(r"[^;#<>\s]")))("colors") +
pp.Optional(buyer_ref)
)
split_comments = pp.Optional(pp.delimitedList(
pp.Group(sub_comment)("comments*"),
delim=";"
))
def parse_comments(original, location, tokens):
# Strip the parentheses.
return split_comments.transformString(original[tokens[0] + 1:tokens[2] - 1])
comments = pp.originalTextFor(pp.nestedExpr()).setParseAction(parse_comments)
使用此功能时,所有内容最终都以一个连续的字符串结尾,这大概是因为外部pp.originalTextFor
。
res = comments.parseString("(#5# blue (purple) <6>;#7# yellow <10>)", parseAll=True)
编辑:
以最后一个示例字符串为例,我想得到一个类似于以下内容的对象结构:
{
"cars": [1, 2],
"colors": "red (maroon) green",
"buyers": [2, 3],
"comments": [
{
"cars": [5],
"colors": "blue (purple)",
"buyers": [6]
},
{
"cars": [7],
"colors": "yellow",
"buyers": [10]
}
]
}
因此,colors
节中的括号应保持顺序,就像散文一样。引入comments
部分的括号中,我不在乎它们的顺序,也不在乎个别注释的顺序。
答案 0 :(得分:2)
我认为您已经准备好了大部分内容,只是在递归部分苦苦挣扎,在该部分中注释本身可以包含子结构,包括更多注释。
您已将此作为您的BNF:
structure ::= [cars] [colors] [comments] [buyers]
cars ::= '#' integer, ... '#'
buyers ::= '<' integer, ... '>'
根据您给出的示例,我用这些猜测填补了空白:
color ::= word composed of alphas
colors ::= (color | '(' color ')' )...
comments ::= '(' structure ';' ... ')'
我采用了您对汽车和购买者的定义,并添加了颜色和递归定义以供注释。然后从BNF到pyparsing表达式进行了相当详尽的转换:
integer = pp.pyparsing_common.integer
car_ref = "#" + pp.Group(pp.delimitedList(integer))("cars") + "#"
buyer_ref = "<" + pp.Group(pp.delimitedList(integer))("buyers") + ">"
# not sure if this will be sufficient for color, but it works for the given examples
color = pp.Word(pp.alphas)
colors = pp.originalTextFor(pp.OneOrMore(color | '(' + color + ')'))("colors")
# define comment placeholder so it can be used in definition of structure
comment = pp.Forward()
structure = pp.Group(pp.Optional(car_ref)
+ pp.Optional(colors)
+ pp.Optional(comment)("comments")
+ pp.Optional(buyer_ref))
# now insert the definition of a comment as a delimited list of structures; this takes care of
# any nesting of comments within comments
LPAREN, RPAREN = map(pp.Suppress, "()")
comment <<= pp.Group(LPAREN + pp.Optional(pp.delimitedList(structure, delim=';')) + RPAREN)
棘手的部分是将comment
的内容定义为structure
的定界列表,并使用<<=
运算符将该定义插入到先前定义的Forward()中占位符。
将您的示例传递给structure.runTests()
可以得到(默认行为是将类似Python的注释视为注释,因此在使用特定示例调用runTests时,我们必须禁用此功能,因为前导'#'是有效的介绍用于汽车):
structure.runTests(examples, comment=None)
red green
[['red green']]
[0]:
['red green']
- colors: 'red green'
#1# red green
[['#', [1], '#', 'red green']]
[0]:
['#', [1], '#', 'red green']
- cars: [1]
- colors: 'red green'
#1# red green <2>
[['#', [1], '#', 'red green', '<', [2], '>']]
[0]:
['#', [1], '#', 'red green', '<', [2], '>']
- buyers: [2]
- cars: [1]
- colors: 'red green'
#1,2# red green <2,3>
[['#', [1, 2], '#', 'red green', '<', [2, 3], '>']]
[0]:
['#', [1, 2], '#', 'red green', '<', [2, 3], '>']
- buyers: [2, 3]
- cars: [1, 2]
- colors: 'red green'
red green ()
[['red green', [[]]]]
[0]:
['red green', [[]]]
- colors: 'red green'
- comments: [[]]
[0]:
[]
#1# red green (blue)
[['#', [1], '#', 'red green (blue)']]
[0]:
['#', [1], '#', 'red green (blue)']
- cars: [1]
- colors: 'red green (blue)'
#1# red green (#5# blue) <2>
[['#', [1], '#', 'red green', [['#', [5], '#', 'blue']], '<', [2], '>']]
[0]:
['#', [1], '#', 'red green', [['#', [5], '#', 'blue']], '<', [2], '>']
- buyers: [2]
- cars: [1]
- colors: 'red green'
- comments: [['#', [5], '#', 'blue']]
[0]:
['#', [5], '#', 'blue']
- cars: [5]
- colors: 'blue'
#1# red green (#5# blue <6>) <2>
[['#', [1], '#', 'red green', [['#', [5], '#', 'blue', '<', [6], '>']], '<', [2], '>']]
[0]:
['#', [1], '#', 'red green', [['#', [5], '#', 'blue', '<', [6], '>']], '<', [2], '>']
- buyers: [2]
- cars: [1]
- colors: 'red green'
- comments: [['#', [5], '#', 'blue', '<', [6], '>']]
[0]:
['#', [5], '#', 'blue', '<', [6], '>']
- buyers: [6]
- cars: [5]
- colors: 'blue'
#1,2# red green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>
[['#', [1, 2], '#', 'red green', [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']], '<', [2, 3], '>']]
[0]:
['#', [1, 2], '#', 'red green', [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']], '<', [2, 3], '>']
- buyers: [2, 3]
- cars: [1, 2]
- colors: 'red green'
- comments: [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']]
[0]:
['#', [5], '#', 'blue (purple)', '<', [6], '>']
- buyers: [6]
- cars: [5]
- colors: 'blue (purple)'
[1]:
['#', [7], '#', 'yellow', '<', [10], '>']
- buyers: [10]
- cars: [7]
- colors: 'yellow'
#1,2# red (maroon) green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>
[['#', [1, 2], '#', 'red (maroon) green', [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']], '<', [2, 3], '>']]
[0]:
['#', [1, 2], '#', 'red (maroon) green', [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']], '<', [2, 3], '>']
- buyers: [2, 3]
- cars: [1, 2]
- colors: 'red (maroon) green'
- comments: [['#', [5], '#', 'blue (purple)', '<', [6], '>'], ['#', [7], '#', 'yellow', '<', [10], '>']]
[0]:
['#', [5], '#', 'blue (purple)', '<', [6], '>']
- buyers: [6]
- cars: [5]
- colors: 'blue (purple)'
[1]:
['#', [7], '#', 'yellow', '<', [10], '>']
- buyers: [10]
- cars: [7]
- colors: 'yellow'
如果使用asDict()
将所有解析结果转换为常规Python字典,则会得到:
structure.runTests(examples, comment=None,
postParse=lambda test, results: results[0].asDict()
)
red green
{'colors': 'red green'}
#1# red green
{'cars': [1], 'colors': 'red green'}
#1# red green <2>
{'colors': 'red green', 'cars': [1], 'buyers': [2]}
#1,2# red green <2,3>
{'colors': 'red green', 'cars': [1, 2], 'buyers': [2, 3]}
red green ()
{'comments': [[]], 'colors': 'red green'}
#1# red green (blue)
{'cars': [1], 'colors': 'red green (blue)'}
#1# red green (#5# blue) <2>
{'colors': 'red green', 'cars': [1], 'comments': [{'cars': [5], 'colors': 'blue'}], 'buyers': [2]}
#1# red green (#5# blue <6>) <2>
{'colors': 'red green', 'cars': [1], 'comments': [{'colors': 'blue', 'cars': [5], 'buyers': [6]}], 'buyers': [2]}
#1,2# red green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>
{'colors': 'red green', 'cars': [1, 2], 'comments': [{'colors': 'blue (purple)', 'cars': [5], 'buyers': [6]}, {'colors': 'yellow', 'cars': [7], 'buyers': [10]}], 'buyers': [2, 3]}
#1,2# red (maroon) green (#5# blue (purple) <6>;#7# yellow <10>) <2,3>
{'colors': 'red (maroon) green', 'cars': [1, 2], 'comments': [{'colors': 'blue (purple)', 'cars': [5], 'buyers': [6]}, {'colors': 'yellow', 'cars': [7], 'buyers': [10]}], 'buyers': [2, 3]}