re.findall返回正确的匹配数量,但所有空字符串

时间:2019-09-03 09:44:08

标签: python regex

我正在尝试用一串PL / FOL公式构建文字列表,相关的代码段正在查找匹配项,但将它们返回为空白。

我尝试了re.escape(formula),但没有执行任何操作。我还尝试了findall模式的简单变体,但是它们随后会生成空列表。

def clean(formula):
    formula = formula.strip()
    formula = re.sub("\( +", "(", formula)
    formula = re.sub(" +\)", ")", formula)
    formula = re.sub("(?P<b_ops>[&v→↔])", " " + "\g<b_ops>" + " ", formula)
    formula = re.sub("[ ]+", " ", formula)
    # Make an inventory of literals for the original formula.
    orig_lit_inv = re.findall("[~]*[A-Z]([a-u]|[w-z]){0,}", formula)
    print(orig_lit_inv)


this_WFF = "(P) & ~(~(Q → (R & ~S)))"
clean(formula=this_WFF)

打印结果时,我得到['', '', '', '']。换句话说,它找到了匹配项,但是返回空白字符串作为那些匹配项,此时它至少应返回[A-Z]的匹配项。使用this_WFF作为参数,clean(formula)应该打印['P', 'Q', 'R', '~S']

1 个答案:

答案 0 :(得分:1)

引用re.findall's documentation

  

如果模式中存在一个或多个捕获组,则返回       组列表;这将是一个元组列表,如果模式       有一个以上的小组。

您的正则表达式包含一个捕获组,因此findall将永远不会为正则表达式的[A-Z]部分返回任何内容。将([a-u]|[w-z])更改为(?:[a-u]|[w-z])可以看到不同之处:

>>> this_WFF = "(P) & ~(~(Q → (R & ~S)))"
>>> def clean(formula):
...     formula = formula.strip()
...     formula = re.sub("\( +", "(", formula)
...     formula = re.sub(" +\)", ")", formula)
...     formula = re.sub("(?P<b_ops>[&v→↔])", " " + "\g<b_ops>" + " ", formula)
...     formula = re.sub("[ ]+", " ", formula)
...     # Make an inventory of literals for the original formula.
...     orig_lit_inv = re.findall("[~]*[A-Z]([a-u]|[w-z]){0,}", formula)
...     print(orig_lit_inv)
... 
>>> clean(this_WFF)
['', '', '', '']
>>> def clean(formula):
...     formula = formula.strip()
...     formula = re.sub("\( +", "(", formula)
...     formula = re.sub(" +\)", ")", formula)
...     formula = re.sub("(?P<b_ops>[&v→↔])", " " + "\g<b_ops>" + " ", formula)
...     formula = re.sub("[ ]+", " ", formula)
...     # Make an inventory of literals for the original formula
...     orig_lit_inv = re.findall("[~]*[A-Z](?:[a-u]|[w-z]){0,}", formula)
...     print(orig_lit_inv)
... 
>>> clean(this_WFF)
['P', 'Q', 'R', '~S']

由于现在正则表达式不包含捕获组findall,因此只在结果中返回“组0”的内容(即整个匹配项)。