我有以下三个字符串:
inputs = [
"Season 12",
"Season 1a",
"Season 1-2"
]
我正在尝试只匹配第一个。这是我当前使用的正则表达式:
outputs = []
for input in inputs:
output = re.search(r'(Staffel|Season|Saison|S\.?)?\s?(\d{0,})(?!(-|[a-z][A-Z]))', input, re.IGNORECASE).group(2)
outputs.append(output)
assert(outputs == ['12','',''])
# AssertionError, values were ['12', '1', '']
当前,此方法适用于Season 12
,Season 1-2
,但不适用于Season 1a
(不应返回任何内容)。
答案 0 :(得分:1)
import re
inputs = [
"Season 12",
"Season 1a",
"Season 1-2",
"Seinfeld, Season 1 (UHD)"
]
re_num = re.compile(
r'(Staffel|Season|Saison|S\.?)\s?((\d+)$|(\d+)\s)',
flags=re.IGNORECASE
)
for s in inputs:
m = re_num.search(s)
if m:
print(s, '-->', m.group(2))
结果:
Season 12 --> 12
Seinfeld, Season 1 (UHD) --> 1
答案 1 :(得分:1)
不确定Seinfeld, Season 1 (UHD)
想要什么。在这里捕获了它,但是如果您不想要它,请将最后一部分从(?:\s|$)
更改为简单的$
import re
inputs = [
"Season 12",
"Season 1a",
"Season 1-3",
"Seinfeld, Season 1 (UHD)",
"Seinfeld, Season 1"
]
outputs = []
for input in inputs:
output = re.search(r'(?:Staffel|Season|Saison|S\.?)?\s(\d+)(?:\s|$)', input, re.IGNORECASE)
if (output != None):
outputs.append(output.group(1))
else:
outputs.append('')
print(outputs)
assert(outputs == ['12','','','1','1'])
输出:
['12', '', '', '1', '1']