正则表达式避免不必要的组

时间:2014-07-15 11:42:29

标签: python regex

我有这段代码:

string = """a = 10 + 15
b = 50 + b
c = a + b
d = c + 50"""


letter = "([a-z])"
signs = "(\+|\-|\*|\/)"
regex = re.compile(r"{0} = (\d+) {1} (\d+)|"
                   r"{0} = (\d+) {1} {0}|"
                   r"{0} = {0} {1} {0}|"
                   r"{0} = {0} {1} (\d+)".format(letter, signs))signs))(\d+)".format(letter,signs))

如果我做re.search(正则表达式,字符串).groups()我最终得到

('a', '10', '+', '15', None, None, None, None, None, None, None, None, None, None, None, None)
(None, None, None, None, 'b', '50', '+', 'b', None, None, None, None, None, None, None, None)
(None, None, None, None, None, None, None, None, 'c', 'a', '+', 'b', None, None, None, None)
(None, None, None, None, None, None, None, None, None, None, None, None, 'd', 'c', '+', '50')

但我只想要4组。 [VAR,VAL1,操作者,VAL2]

我正在使用列表理解

[r for r in re.search(regex,string).groups() if r != None]

但我想知道在正则表达式本身是否有办法做到这一点。

2 个答案:

答案 0 :(得分:2)

在这种情况下,最好将正则表达式从四个单独的语句简化为一个稍微重载的语句,这需要修改letter

letter = "[a-z]"
signs = "(\+|\-|\*|\/)"
regex = re.compile(r"({0}) = (\d+|{0}) {1} (\d+|{0})".format(letter, signs))signs))(\d+)".format(letter,signs))

答案 1 :(得分:0)

您可以将(?:...)用作非捕获组,以用于需要分组但不捕获的内容。 E.g:

signs = "(?:\+|\-|\*|\/)"

但是,你可以通过而不是首先制作signsletter群来摆脱大量的群体:

letter = "[a-z]"
signs = "[+*/-]"