为什么这个正则表达式如此懒惰?它应该返回引用高度/宽度属性,介于两者之间(可选),然后是另一个高度/宽度属性(可选)。它只获得第一个属性,然后即使它可以匹配更多也退出。
((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?
答案 0 :(得分:6)
查看正在发生的事情的最简单方法是将其分解为扩展格式。在扩展格式中,你的正则表达式......
((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?
然后变成(带有评论,扩展格式合法):
( # a group that captures...
(?:height|width) # Height or width
= # The Equals sign
["'] # a double quote or quote
\d* # zero or more digits 0-9
["'] # a double quote or quote
) # requried
( # zero or more groups that capture...space chars,
[\s\w:;'"=] # letters, numbers, colon, quote, dobule quote, and equals
)*? # zero or more times, lazily (giving up as much as it can)
( # a group that...
(?:height|width) # height or width
= # colon
["'] # double quote or quote
\d* # zero or more numbers
["'] # double quote or quote
)? # optionally
因此,您的正则表达式可能会生成1个组,最多可生成N个组,具体取决于您正在使用的正则表达式引擎。你的最后一组将是你想要的小组,如果有的话。删除第二组(?
)的延迟修饰符,并使第二组不捕获,如下所示:
( # a group that captures...
(?:height|width) # Height or width (non capturing)
= # The Equals sign
["'] # a double quote or quote
\d* # zero or more digits 0-9
["'] # a double quote or quote
) # requried
(?: # zero or more groups of space chars, letters,
[\s\w:;'"=] # numbers, colon, quote, dobule quote, and equals
)* # zero or more times as much as it can UNTIL...
( # a group that captures...
(?:height|width) # height or width (non-capturing)
= # colon
["'] # double quote or quote
\d* # zero or more numbers
["'] # double quote or quote
)? # optional
现在第一个和最后一个标签分别在第1组和第2组中,忽略了中间的内容。如果有最后一个,它将被捕获。
注意:它可能没有捕获最后一部分,因为没有指定需要在中间组中捕获的字符。如果有逗号,#
或任何其他类型的标记字符,则不会由该中间组的字符类指定。你可以考虑用以下代码替换中间的那个:
["'] # a double quote or quote
) # requried
.* # Anything, zero or more times, UNTIL...
( # a group that...
(?:height|width) # height or width (non-capturing)
并查看该DOES是否匹配。如果是,您可能需要进一步增强中间组的角色。
如果您不关心中间组中发生了多少匹配,只需捕获它,使用非捕获组捕获每个子集,然后使用一组来捕获整个中间组集合: / p>
["'] # a double quote or quote
) # requried
( # a group that captures...
(?: # zero or more groups of space chars, letters,
[\s\w:;'"=] # numbers, colon, quote, dobule quote, and equals
)* # zero or more times as much as it can
) # UNTIL...
( # a group that captures...
(?:height|width) # height or width (non-capturing)
现在你将获得固定数量的捕获,第一部分总是在第1组中,中间部分总是在第2组中,最后一部分(如果它在那里)在第3组中。