希望Paul McGuire可以发现这一点并拯救我......
我抓住了'正则表达式逆变器'示例脚本http://pyparsing.wikispaces.com/file/view/invRegex.py
我正在尝试破解对python命名组的支持,例如(?P<blob_key>[a-zA-Z0-9-_=]+)
我是pyparsing的新手,我意识到正则表达式解析器可能不是最好的学习方式(我只是试图用结果来实现一些实际操作)。
我已经编辑了解析器函数,如下所示:
def parser():
global _parser
if _parser is None:
lbrack = Literal("[")
rbrack = Literal("]")
lbrace = Literal("{")
rbrace = Literal("}")
lparen = Literal("(")
rparen = Literal(")")
pyspec = Literal("?P")
langle = Literal("<")
rangle = Literal(">")
reMacro = Combine("\\" + oneOf(list("dws")))
escapedChar = ~reMacro + Combine("\\" + oneOf(list(printables)))
reLiteralChar = "".join(c for c in printables if c not in r"\[]{}().*?+|")
reRange = Combine(lbrack + SkipTo(rbrack,ignore=escapedChar) + rbrack)
reLiteral = ( escapedChar | oneOf(list(reLiteralChar)) )
reDot = Literal(".")
repetition = (
( lbrace + Word(nums).setResultsName("count") + rbrace ) |
( lbrace + Word(nums).setResultsName("minCount")+","+ Word(nums).setResultsName("maxCount") + rbrace ) |
oneOf(list("*+?"))
)
reNamedGroup = Combine(lparen + pyspec + langle + SkipTo(rangle) + rangle
+ SkipTo(rparen, include=True) + rparen)
reNamedGroup.setParseAction(handleNamedGroup)
reRange.setParseAction(handleRange)
reLiteral.setParseAction(handleLiteral)
reMacro.setParseAction(handleMacro)
reDot.setParseAction(handleDot)
reTerm = ( reLiteral | reNamedGroup | reRange | reMacro | reDot )
reExpr = operatorPrecedence( reTerm,
[
(repetition, 1, opAssoc.LEFT, handleRepetition),
(None, 2, opAssoc.LEFT, handleSequence),
(Suppress('|'), 2, opAssoc.LEFT, handleAlternative),
]
)
_parser = reExpr
return _parser
当我针对我的测试正则表达式运行时,reNamedGroup
似乎正确地找到并处理了命名组(我在SkipTo
和其他方法中记录了一些...)但同时它似乎根本没有参与输出,我的handleNamedGroup
函数从未被调用过。
日志输出如下:
invert(r'serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/')
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleLiteral: ['s']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['r']
DEBUG:root: handleLiteral: ['v']
DEBUG:root: handleLiteral: ['e']
DEBUG:root: handleLiteral: ['_']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['l']
DEBUG:root: handleLiteral: ['o']
DEBUG:root: handleLiteral: ['b']
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 12
DEBUG:root: *** 15, A-Z
DEBUG:root: handleRange: ['[A-Z]']
DEBUG:root: handleRepetition: [[[ABCDEFGHIJKLMNOPQRSTUVWXYZ], '{', '2', '}']]
DEBUG:root: handleLiteral: ['/']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 24
DEBUG:root: *** 32, blob_key
DEBUG:root: serve_blob/[A-Z]{2}/(?P<blob_key>[a-zA-Z0-9-_=]+)/, 33
DEBUG:root: * 49, [')'], [a-zA-Z0-9-_=]+
DEBUG:root: ** ['[a-zA-Z0-9-_=]+', ')']
DEBUG:root: handleSequence: [[Lit:s, Lit:e, Lit:r, Lit:v, Lit:e, Lit:_, Lit:b, Lit:l, Lit:o, Lit:b, Lit:/, <libs.exreg.exreg.GroupEmitter object at 0x34cfa30>, Lit:/]]
前缀为**的行是从skipRes
返回的SkipTo
值...它对我来说是正确的。我难以理解的部分是他们被忽视的原因。
我敏锐地意识到我只是盲目地复制和粘贴东西......我试图仔细复制适用于reRange
的东西......但是范围有效,而我的类似位则没有。
我猜测周围的括号可能在解析的后期阶段从输出中“隐藏”已解析的命名组,但我对如何丢失感到遗憾。
答案 0 :(得分:1)
您不希望在reNamedGroup表达式中对parens执行任何操作。请注意,parens中包含的re组没有其他定义的语法,但它们绝对有效。在此解析器中,parens作为operatorPrecedence表达式的一部分进行处理。刚刚将reNamedGroup的定义更改为:
reNamedGroup = pyspec + langle + SkipTo(rangle) + rangle
让operatorPrecedence处理所有的paren分组。
[由OP编辑]
以上更改单独的工作,但命名组的所有输出都以P
或?
开头,因此pyspec
部分以某种方式泄漏到输出中。最后我不需要以堆栈形式重写(见注释),以下额外的更改使其正常工作:
reTerm = ( reLiteral | reRange | reMacro | reDot )
reExpr = operatorPrecedence( reTerm,
[
(reNamedGroup.suppress(), 1, opAssoc.RIGHT, handleNamedGroup),
(repetition, 1, opAssoc.LEFT, handleRepetition),
(None, 2, opAssoc.LEFT, handleSequence),
(Suppress('|'), 2, opAssoc.LEFT, handleAlternative),
]
)