Question

我尝试分析HTML代码并从源代码中提取所有CSS类和ID。所以我需要提取两个引号之间的任何内容，前面可以是class或id：

id="<extract this>"

class="<extract this>"

Answer 1

/(?:id|class)="([^"]*)"/gi

替换表达式：$ 1

英语中的正则表达式：匹配＆＃34; id＆＃34;或者＆＃34; class＆＃34;然后是一个等号和引号，然后在匹配另一个引号之前捕获所有非引号的内容。在全球范围内这样做并且不区分大小写。

Answer 2

由于您更喜欢使用正则表达式，因此我认为这是一种方式。

\b(?:id|class)\s*=\s*"([^"]*)"

正则表达式：

\b             # the boundary between a word char (\w) and not a word char
(?:            # group, but do not capture:
  id           # 'id'
 |             # OR
  class        # 'class'
)              # end of grouping
\s*            # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
 =             # '='
 \s*           # whitespace (\n, \r, \t, \f, and " ") (0 or more times)
   "           # '"'
   (           # group and capture to \1:
    [^"]*      # any character except: '"' (0 or more times)
   )           # end of \1
   "           # '"'

Answer 3

你可能想试试这个：

<?php

$css = <<< EOF
id="<extract this>"
class="<extract this>"id="<extract this2>"
class="<extract this3>"id="<extract this4>"
class="<extract this5>"id="<extract this6>"
class="<extract this7>"id="<extract this8>"
class="<extract this9>"
EOF;

preg_match_all('/(?:id|class)="(.*?)"/sim', $css , $classes, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($classes[1]); $i++) {
    echo $classes[1][$i]."\n";
}
    /*
    <extract this>
    <extract this>
    <extract this2>
    <extract this3>
    <extract this4>
    <extract this5>
    <extract this6>
    <extract this7>
    <extract this8>
    <extract this9>
    */
?>

<强> 样本：
http://ideone.com/Nr9FPt

正则表达式匹配CSS页面中的ID和类

3 个答案: