Question

以下是示例：

a = "one two three four five six one three four seven two"
m = re.search("one.*four", a)

我想要的是找到来自＆＃34;一个＆＃34;的子串。到＆＃34;四＆＃34;它不包含子串＆＃34;两个＆＃34;之间。答案应该是：m.group（0）=＆＃34;一个三四＆＃34;，m.start（）= 28，m.end（）= 41

有没有办法用一条搜索线做到这一点？

Answer 1

您可以使用此模式：

one(?:(?!two).)*four

在匹配任何其他角色之前，我们检查我们没有开始匹配“两个”。

工作示例：http://regex101.com/r/yY2gG8

Answer 2

随着Satoru加入更难的字符串，这有效：

>>> import re
>>> a = "one two three four five six one three four seven two"
>>> re.findall("one(?!.*two.*four).*four", a)
['one three four']

但是 - 有一天 - 你真的会后悔写一些棘手的正则表达式。如果这是我需要解决的问题，我会这样做：

for m in re.finditer("one.*?four", a):
    if "two" not in m.group():
        break

我在那里使用最小匹配（.*?）非常棘手。 Regexps可能是一个真正的痛苦： - （

编辑：哈哈！但如果你让字符串变得更难，那么顶部的混乱正面再次失败：

a = "one two three four five six one three four seven two four"

最后：这是一个正确的解决方案：

>>> a = 'one two three four five six one three four seven two four'
>>> m = re.search("one([^t]|t(?!wo))*four", a)
>>> m.group()
'one three four'
>>> m.span()
(28, 42)

我知道你说你希望m.end()为41，但那是不正确的。

Answer 3

您可以使用否定前瞻声明(?!...)：

re.findall("one(?!.*two).*four", a)

Answer 4

另一个具有非常简单图案的衬垫

import re
line = "one two three four five six one three four seven two"

print [X for X in [a.split()[1:-1] for a in 
                     re.findall('one.*?four', line, re.DOTALL)] if 'two' not in X]

给了我

>>> 
[['three']]

Python正则表达式：查找不包含子字符串的子字符串

4 个答案: