正则表达式可选类标记

时间:2012-03-13 10:54:28

标签: regex

我正在尝试捕获<pre>标记中的属性以及可选的类标记。我想在一个正则表达式中捕获类标记的内容,而不是捕获所有属性,然后在可能的情况下找到类属性值。由于类标记是可选的,我尝试添加?,但这会导致以下正则表达式仅使用最后一个捕获组捕获 - 未捕获类,也不会捕获它之前的属性。

// Works, but class isn't optional
'(?<!\$)<pre([^\>]*?)(\bclass\s*=\s*(["\'])(.*?)\3)([^\>]*)>'

// Fails to match class, the whole set of attributes are matched by last group
'(?<!\$)<pre([^\>]*?)(\bclass\s*=\s*(["\'])?(.*?)\3)([^\>]*)>'

e.g. <pre style="..." class="some-class" title="stuff">

修改

我最终使用了这个:

$wp_content = preg_replace_callback('#(?<!\$)<\s*pre(?=(?:([^>]*)\bclass\s*=\s*(["\'])(.*?)\2([^>]*))?)([^>]*)>(.*?)<\s*/\s*pre\s*>#msi', 'CrayonWP::pre_tag', $wp_content);

它允许标记内的空格,并且还分隔类属性之前和之后的内容以及捕获所有属性。

然后回调将事情放在适当位置:

public static function pre_tag($matches) {
    $pre_class = $matches[1];
    $quotes = $matches[2];
    $class = $matches[3];
    $post_class = $matches[4];
    $atts = $matches[5];
    $content = $matches[6];
    if (!empty($class)) {
        // Allow hyphenated "setting-value" style settings in the class attribute
        $class = preg_replace('#\b([A-Za-z-]+)-(\S+)#msi', '$1='.$quotes.'$2'.$quotes, $class);
        return "[crayon $pre_class $class $post_class] $content [/crayon]";
    } else {
        return "[crayon $atts] $content [/crayon]";
    }
}

1 个答案:

答案 0 :(得分:4)

您可以将class属性的捕获组放在先行断言中并使其成为可选项:

'(?<!\$)<pre(?=(?:[^>]*\bclass\s*=\s*(["\'])(.*?)\1)?)([^>]*)>'

现在,$2将包含class属性的值(如果存在)。

(?<!\$)               # Assert no preceding $ (why?)
<pre                  # Match <pre
(?=                   # Assert that the following can be matched:
 (?:                  # Try to match this:
  [^>]*               #  any text except >
   \bclass\s*=\s*     #  class =
   (["\'])            #  opening quote
   (.*?)              #  any text, lazy --> capture this in group no. 2
   \1                 #  corresponding closing quote
 )?                   # but make the whole thing optional.
)                     # End of lookahead
([^\>]*)>             # Match the entire contents of the tag and the closing >