我在python中使用正则表达式库时遇到了一个小问题,特别是匹配方法有不同的模式:
import re
files = ["noi100k_0p55m0p3_fow71f",\
"fnoi100v5_71f60s",\
"noi100k_0p55m0p3_151f_560s",\
"noi110v25_560s"]
for i in files:
keyws = i.split("_")
for j in keyws:
if re.match(r"noi(\w+)k|fnoi(\w+)v(\w+)|noi(\w+)v(\w+)",j):
print "Results :", re.match(r"noi(\w+)k|fnoi(\w+)v(\w+)|noi(\w+)v(\w+)",j).group(1)
结果是:
Results : 100
Results : None
Results : 100
Results : None
我期待的时候:
Results : 100
Results : 100
Results : 100
Results : 110
唯一的匹配是"noi(\w+)k"
它似乎不会测试其他模式,但re.match(a|b,string)
应检查a
和b
模式否?
答案 0 :(得分:1)
您的群组从左到右编号;如果一个的替代品匹配,则您需要提取的 。
您有5个组,组1或组2和3,或组4和组5将包含匹配项:
for j in keyws:
match = re.match(r"noi(\w+)k|fnoi(\w+)v(\w+)|noi(\w+)v(\w+)",j)
if match:
results = match.group(1) or match.group(2) or match.group(4)
print "Results :", results
会在每个替代方案中打印第一个匹配的\w+
组。
演示:
>>> import re
>>> files = ["noi100k_0p55m0p3_fow71f",\
... "fnoi100v5_71f60s",\
... "noi100k_0p55m0p3_151f_560s",\
... "noi110v25_560s"]
>>> for i in files:
... keyws = i.split("_")
... for j in keyws:
... match = re.match(r"noi(\w+)k|fnoi(\w+)v(\w+)|noi(\w+)v(\w+)",j)
... if match:
... results = match.group(1) or match.group(2) or match.group(4)
... print "Results :", results
...
Results : 100
Results : 100
Results : 100
Results : 110
如果您不打算使用其他两个捕获的(\w+)
组,请删除括号,以便更轻松地选择匹配的组:
match = re.match(r"noi(\w+)k|fnoi(\w+)v\w+|noi(\w+)v\w+",j)
if match:
results = next(g for g in match.groups() if g)
print "Results :", results
选择第一个非空的匹配组。
如果您接受fnoi(\w+)k
,也可以进一步简化您的模式:
match = re.match(r"f?noi(\w+)[kv](\w*)", j)
此时只有.group(1)
。