Question

我有以下html剪切和正则表达式，尝试匹配所有没有'data-capture =“0”'属性的脚本标记。

$html = '
 <p>
    <script src="//test.com/should-be-matched-but-it-is-not-library.js"></script>
    <script type="text/javascript" src="//test.com/should-not-be-matched-but-it-is-library.js" data-capture="0"></script>
    <script type="text/javascript" src="//test.com/correctly-matched-library.js"></script>
    <script type="text/javascript">var foo = "correctly matched";</script>
    <script>var bar = "correctly matched again";</script>
    <script data-capture="0">var baz= "correctly not be matched";</script>
    <script src="//test.com/correctly-not-matched-library.js" data-capture="0"></script>
 </p>
';
preg_match_all('/((<script(?: type="text\/javascript"(?! data-capture="0")).*?>|<script>).*?<\/script>)/s', $html, $matches);

但是，第一个脚本不匹配（应该是），第二个脚本是（不应该），我无法理解为什么。有人可以提出建议吗？

我知道正确的表达式在解析HTML时不可靠。让我们将其视为一般的正则表达式案例，而不是尝试创建HTML解析器。

Answer 1

为什么不这样：

preg_match_all(
     '/((<script(?! data-capture="0").*?>|<script>).*?<\/script>)/s',
     $html, 
     $matches
);

第一个没有type属性，而在正则表达式

中搜索它

尝试将html中的脚本标记与正则表达式匹配

1 个答案: