当组之间存在随机字符串时匹配可选字符串

时间:2014-12-24 21:15:18

标签: python regex

我是python的新手,当组之间可以有任意数量的字符串时,我遇到匹配可选字符串的问题。这是我正在寻找的一个例子:

'The quick brown fox jumps over the lazy dog'

我想要'brown'之后的单词,如果单词'lazy'存在,我想要跟随它的单词,即:

'The quick brown fox jumps over the lazy dog'      --> ('fox', 'dog')
'The quick brown fox'                              --> ('fox', '')
'The quick brown fox dfjdnjcnjdn vvvv lazy mouse'  --> ('fox', 'mouse')
'The quick brown fox lazy dog'                     --> ('fox', 'dog')

这是我尝试过的,但它不起作用

re.findall(r'brown (\S+)(.*?)(lazy )?(\S+)?', str)

我做错了什么以及如何解决这个问题?

2 个答案:

答案 0 :(得分:1)

您可以使用以下内容来获取您正在寻找的字词:

brown (\S+)(?:.*lazy (\S+))?

哪个会给出元组列表,如果lazy不存在,则为空字符串。

>>> import re
>>> s = """The quick brown fox jumps over the lazy dog
... The quick brown fox
... The quick brown fox dfjdnjcnjdn vvvv lazy mouse
... The quick brown fox lazy dog"""
>>> re.findall(r'brown (\S+)(?:.*lazy (\S+))?', s)
[('fox', 'dog'), ('fox', ''), ('fox', 'mouse'), ('fox', 'dog')]
>>>

(?: ... )用于创建无法捕获的组,因此内部的内容不一定会进入带有re.findall的元组/列表,除非它本身位于捕获组中。

答案 1 :(得分:0)

您可以使用以下模式:

(?:brown|lazy)\s(\S+)

以下是匹配内容的细分:

(?:brown|lazy)  # The words 'brown' or 'lazy'
\s              # A whitespace character
(\S+)           # One or more non-whitespace characters

这是一个示范:

>>> import re
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox jumps over the lazy dog')
['fox', 'dog']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox')
['fox']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox dfjdnjcnjdn vvvv lazy mouse')
['fox', 'mouse']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox lazy dog')
['fox', 'dog']
>>>