Question

为什么这个正则表达式如此懒惰？它应该返回引用高度/宽度属性，介于两者之间（可选），然后是另一个高度/宽度属性（可选）。它只获得第一个属性，然后即使它可以匹配更多也退出。

((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?

sample code on regexpal

Answer 1

查看正在发生的事情的最简单方法是将其分解为扩展格式。在扩展格式中，你的正则表达式......

((?:height|width)=["']\d*["'])([\s\w:;'"=])*?((?:height|width)=["']\d*["'])?

然后变成（带有评论，扩展格式合法）：

(                     # a group that captures...
    (?:height|width)  # Height or width
    =                 # The Equals sign
    ["']              # a double quote or quote
    \d*               # zero or more digits 0-9
    ["']              # a double quote or quote
)                     # requried
(                     # zero or more groups that capture...space chars, 
    [\s\w:;'"=]       # letters, numbers, colon, quote, dobule quote, and equals 
)*?                   # zero or more times, lazily (giving up as much as it can)
(                     # a group that...
    (?:height|width)  # height or width
    =                 # colon
    ["']              # double quote or quote
    \d*               # zero or more numbers
    ["']              # double quote or quote
)?                    # optionally

因此，您的正则表达式可能会生成1个组，最多可生成N个组，具体取决于您正在使用的正则表达式引擎。你的最后一组将是你想要的小组，如果有的话。删除第二组（?）的延迟修饰符，并使第二组不捕获，如下所示：

(                     # a group that captures...
    (?:height|width)  # Height or width (non capturing)
    =                 # The Equals sign
    ["']              # a double quote or quote
    \d*               # zero or more digits 0-9
    ["']              # a double quote or quote
)                     # requried
(?:                   # zero or more groups of space chars, letters, 
    [\s\w:;'"=]       # numbers, colon, quote, dobule quote, and equals 
)*                    # zero or more times as much as it can UNTIL...
(                     # a group that captures...
    (?:height|width)  # height or width (non-capturing)
    =                 # colon
    ["']              # double quote or quote
    \d*               # zero or more numbers
    ["']              # double quote or quote
)?                    # optional

现在第一个和最后一个标签分别在第1组和第2组中，忽略了中间的内容。如果有最后一个，它将被捕获。

注意：它可能没有捕获最后一部分，因为没有指定需要在中间组中捕获的字符。如果有逗号，#或任何其他类型的标记字符，则不会由该中间组的字符类指定。你可以考虑用以下代码替换中间的那个：

    ["']              # a double quote or quote
)                     # requried
.*                    # Anything, zero or more times, UNTIL...
(                     # a group that...
    (?:height|width)  # height or width (non-capturing)

并查看该DOES是否匹配。如果是，您可能需要进一步增强中间组的角色。

如果您不关心中间组中发生了多少匹配，只需捕获它，使用非捕获组捕获每个子集，然后使用一组来捕获整个中间组集合： / p>

    ["']              # a double quote or quote
)                     # requried
(                     # a group that captures...
    (?:               # zero or more groups of space chars, letters, 
        [\s\w:;'"=]   # numbers, colon, quote, dobule quote, and equals 
    )*                # zero or more times as much as it can
)                     # UNTIL...
(                     # a group that captures...
    (?:height|width)  # height or width (non-capturing)

现在你将获得固定数量的捕获，第一部分总是在第1组中，中间部分总是在第2组中，最后一部分（如果它在那里）在第3组中。

为什么我的正则表达式如此懒惰？

1 个答案: