I have the following code to extract javascript code:
preg_match_all('#<script(?:[^>]+)?>(.*?)</script>#is', $GLOBALS['content'], $matches, PREG_SET_ORDER)
It works excellent for this:
<script type="text/javascript">
<script type="application/javascript">
<script>
But how do I avoid matching?
<script type="application/ld+json">
答案 0 :(得分:2)
Either as @Wiktor says (using a negative lookahead) or with a parser:
<?php
$data = <<<DATA
<script type="text/javascript">some js code here</script>
<script type="application/javascript">some other code here</script>
<script>This looks naked, dude!</script>
<script type="application/ld+json">THIS MUST NOT BE MATCHED</script>
DATA;
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$scripts = $xpath->query("//script[not(@type='application/ld+json')]");
foreach ($scripts as $script) {
# code...
}
?>
答案 1 :(得分:1)
The following should work:
<script(?!\stype="application\/ld\+json")[^>]*>(.*?)<\/script>
It uses negative lookahead to exclude the unwanted JSON. You might not need to escape the backslashes. But you need to escape the + sign in ld+json to prevent it from being treated as a quantifier rather than verbatim.
See it in action: RegEx101
Please comment if and as this requires adjustment / further detail.
答案 2 :(得分:0)
Opposite of the solution by @Wiktor, to match any javascript...
<script type="text/javascript">...</script>
<script type="application/javascript">...</script>
<script>...</script>
...and skip any other types, use:
#<script(?:[^>]*\stype="(?:application|text)/javascript")?[^>]*>(.*?)</script>#is