Question

我需要一个捕获两个组的正则表达式：电影和年份。（可选）它们之间可以有一个'from'字符串。

我的预期结果是：

first_query="matrix 2013" => ('matrix', '2013')
second_query="matrix from 2013" => ('matrix', '2013')
third_query="matrix" => ('matrix', None)

我在python3的https://regex101.com/上进行了2次仿真：我-r"(.+)(?:from ){0,1}([1-2]\d{3})" 不匹配first_query和third_query，也没有在第一组中省略“ from”，这是我要避免的。

II- r"(.+)(?:from ){1}([1-2]\d{3})" 与second_query一起使用，但与first_query和third_query不匹配。

是否可以匹配所有三个字符串，而忽略第一组中的“ from”字符串？

谢谢。

Answer 1

您可以使用

^(.+?)(?:\s+(?:from\s+)?([12]\d{3}))?$

请参见regex demo

详细信息

^-字符串的开头
(.+?)-第1组：除换行符以外的任何1+个字符，且尽可能少
(?:\s+(?:from\s+)?([12]\d{3}))?-一个可选的非捕获组，匹配以下情况的1或0：
- \s+-超过1个空格
- (?:from\s+)?-from子字符串的可选序列，后跟1+空格
- ([12]\d{3})-第2组：1或2，后跟3位数字
$-字符串的结尾。

Answer 2

这将输出您的模式，但数字中的空格过多：

import re

pat = r"^(.+?)(?: from)? ?(\d+)?$"


text = """matrix 2013
matrix from 2013
matrix"""

for t in text.split("\n"):
    print(re.findall(pat,t))

输出：

[('matrix', '2013')]
[('matrix', '2013')]
[('matrix', '')]

说明：

 ^           start of string
(.+?)        lazy anythings as few as possible
(?: from)?   non-grouped optional ` from`
 ?           optional space
(\d+=)?$     optional digits till end of string

演示：https://regex101.com/r/VD0SZb/1

Answer 3

import re

pattern = re.compile( r"""
    ^\s*              # start of string (optional whitespace)
    (?P<title>\S+)    # one or more non-whitespace characters (title)
    (?:\s+from)?      # optionally, some space followed by the word 'from'
    \s*               # optional whitespace
    (?P<year>[0-9]+)? # optional digit string (year)
    \s*$              # end of string (optional whitespace)
""", re.VERBOSE )

for query in [ 'matrix 2013', 'matrix from 2013', 'matrix' ]:
    m = re.match( pattern, query )
    if m: print( m.groupdict() )

# Prints:
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': '2013'}
# {'title': 'matrix', 'year': None}

免责声明：此正则表达式不包含以《黑客帝国》于1999年实际问世为理由拒绝前两场比赛所必需的逻辑。

省略python3正则表达式中的可选单词的问题

3 个答案: