我是python的新手,当组之间可以有任意数量的字符串时,我遇到匹配可选字符串的问题。这是我正在寻找的一个例子:
'The quick brown fox jumps over the lazy dog'
我想要'brown'
之后的单词,如果单词'lazy'
存在,我想要跟随它的单词,即:
'The quick brown fox jumps over the lazy dog' --> ('fox', 'dog')
'The quick brown fox' --> ('fox', '')
'The quick brown fox dfjdnjcnjdn vvvv lazy mouse' --> ('fox', 'mouse')
'The quick brown fox lazy dog' --> ('fox', 'dog')
这是我尝试过的,但它不起作用
re.findall(r'brown (\S+)(.*?)(lazy )?(\S+)?', str)
我做错了什么以及如何解决这个问题?
答案 0 :(得分:1)
您可以使用以下内容来获取您正在寻找的字词:
brown (\S+)(?:.*lazy (\S+))?
哪个会给出元组列表,如果lazy
不存在,则为空字符串。
>>> import re
>>> s = """The quick brown fox jumps over the lazy dog
... The quick brown fox
... The quick brown fox dfjdnjcnjdn vvvv lazy mouse
... The quick brown fox lazy dog"""
>>> re.findall(r'brown (\S+)(?:.*lazy (\S+))?', s)
[('fox', 'dog'), ('fox', ''), ('fox', 'mouse'), ('fox', 'dog')]
>>>
(?: ... )
用于创建无法捕获的组,因此内部的内容不一定会进入带有re.findall
的元组/列表,除非它本身位于捕获组中。
答案 1 :(得分:0)
您可以使用以下模式:
(?:brown|lazy)\s(\S+)
以下是匹配内容的细分:
(?:brown|lazy) # The words 'brown' or 'lazy'
\s # A whitespace character
(\S+) # One or more non-whitespace characters
这是一个示范:
>>> import re
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox jumps over the lazy dog')
['fox', 'dog']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox')
['fox']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox dfjdnjcnjdn vvvv lazy mouse')
['fox', 'mouse']
>>> re.findall(r'(?:brown|lazy)\s(\S+)', 'The quick brown fox lazy dog')
['fox', 'dog']
>>>