两个非常接近的正则表达式在Python中具有前瞻性断言 - 为什么re.split()表现不同?

时间:2011-07-15 20:34:50

标签: python regex lookahead

我正在尝试使用this question,其中OP包含以下字符串:

"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"

并希望将其拆分以获得以下列表:

['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

我试图通过在正则表达式(?=path:)中使用简单的先行断言来解决它。好吧,它不起作用:

>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']

然而,在this answer中,回答者通过在前瞻断言前加上空格来使其工作:

>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']

为什么正则表达式适用于空白?为什么没有空格就行不通?

1 个答案:

答案 0 :(得分:5)

Python的re.split()有一个documented limitation:它无法拆分零长度匹配。因此,拆分仅适用于增加的空间。