Question

我有一个字符串：

"foo hello world baz 33"

foo和baz之间的部分将是一些空格分隔的单词（一个或多个）。我希望将此字符串与将分组每个单词的re匹配：

>>> re.match(r'foo (<some re here>) baz (\d+)', "foo hello world baz 33").groups() 
('hello', 'world', '33')

re应该是灵活的，以便它可以在周围没有单词的情况下工作：

>>> re.match(r'(<some re here>)', "hello world").groups() 
('hello', 'world')

我尝试使用([\w+\s])+进行修改，但我无法捕获动态确定数量的组。这可能吗？

Answer 1

re.match在字符串的开头返回结果。请改用re.search .*?返回两个单词/表达式之间的最短匹配（。表示任何内容，*表示0或更多次出现，？表示最短匹配）。

import re
my_str = "foo hello world baz 33"
my_pattern = r'foo\s(.*?)\sbaz'
p = re.search(my_pattern,my_str,re.I)
result =  p.group(1).split()
print result

['hello', 'world']

编辑：

如果缺少foo或baz，并且您需要返回整个字符串，请使用if-else：

if p is not None:
    result = p.group(1).split()
else:
    result = my_str

为什么模式中的?：
假设单词baz出现多次：

my_str =  "foo hello world baz 33 there is another baz"

使用pattern = 'foo\s(.*)\sbaz'将匹配（最长和贪婪）：

'hello world baz 33 there is another'

然而，使用pattern = 'foo\s(.*?)\sbaz'将返回最短匹配：

'hello world'

Answer 2

[这不是一个解决方案，但我试着解释为什么不可能]

你所追求的是像这样的东西：：

foo\s(\w+\s)+baz\s(\d+)

很酷的部分是(\w+\s)+会重复捕获组。问题是大多数正则表达式都只存储该捕获组中的最后一个匹配项;旧的捕获被覆盖。

我建议使用更简单的正则表达式循环遍历字符串。

希望有所帮助

Answer 3

使用index查找foo和baz。然后split子字符串

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end].split()
    except ValueError:
        return ""

s = "foo hello world baz 33"
start = "foo"
end = "baz"
print find_between(s,start,end)

正则表达式：匹配和分组可变数量的空格分隔单词

3 个答案: