Conditional lookahead for preg_match

时间:2016-07-11 18:54:25

标签: php regex preg-match preg-match-all

I have the following code to extract javascript code:

preg_match_all('#<script(?:[^>]+)?>(.*?)</script>#is', $GLOBALS['content'], $matches, PREG_SET_ORDER)

It works excellent for this:

<script type="text/javascript">
<script type="application/javascript">
<script>

But how do I avoid matching?

<script type="application/ld+json">

3 个答案:

答案 0 :(得分:2)

Either as @Wiktor says (using a negative lookahead) or with a parser:

<?php

$data = <<<DATA
<script type="text/javascript">some js code here</script>
<script type="application/javascript">some other code here</script>
<script>This looks naked, dude!</script>
<script type="application/ld+json">THIS MUST NOT BE MATCHED</script>
DATA;

$dom = new DOMDocument();
$dom->loadHTML($data);

$xpath = new DOMXPath($dom);
$scripts = $xpath->query("//script[not(@type='application/ld+json')]");
foreach ($scripts as $script) {
    # code...
}
?>

答案 1 :(得分:1)

The following should work:

<script(?!\stype="application\/ld\+json")[^>]*>(.*?)<\/script>

It uses negative lookahead to exclude the unwanted JSON. You might not need to escape the backslashes. But you need to escape the + sign in ld+json to prevent it from being treated as a quantifier rather than verbatim.

See it in action: RegEx101

Please comment if and as this requires adjustment / further detail.

答案 2 :(得分:0)

Opposite of the solution by @Wiktor, to match any javascript...

<script type="text/javascript">...</script>
<script type="application/javascript">...</script>
<script>...</script>

...and skip any other types, use:

 #<script(?:[^>]*\stype="(?:application|text)/javascript")?[^>]*>(.*?)</script>#i‌​s