正则表达式 - preg_split获取内联脚本标记

时间:2013-11-22 01:55:47

标签: php regex preg-split

我正在尝试将各个内联脚本标记分开:

<script>
    console.log('hello');
    console.log('hi!');
    console.log('yo!');
</script>
<script type="text/javascript">
    console.log("this is another inline script");
    var hi = "cool";
    console.log(hi);
</script>

这是我正在使用的模式:

$scripts = preg_split('#(<script>.*?</script>|<script type="text/javascript>.*?</script>")#', $str);    

但我得到了这个结果:

Array
(
    [0] =>     <script>
        console.log('hello');
        console.log('hi!');
        console.log('yo!');
    </script>
    <script type="text/javascript">
        console.log("this is another inline script");
        var hi = "cool";
        console.log(hi);
    </script>
)

虽然我期待得到这样的东西:

Array
(
    [0] =>     <script>
        console.log('hello');
        console.log('hi!');
        console.log('yo!');
    </script>
    [1] =>
    <script type="text/javascript">
        console.log("this is another inline script");
        var hi = "cool";
        console.log(hi);
    </script>
)

我正在使用的模式有什么问题?提前谢谢!

更新

如果我使用s修饰符,我会得到类似的结果:

Array
(
    [0] => 
    [1] => 
<script type="text/javascript">
            console.log("this is another inline script");
            var hi = "cool";
            console.log(hi);
</script>
)

它设法分离2个脚本,但第一个脚本变为空字符串

2 个答案:

答案 0 :(得分:1)

我要列出一个清单:

    除非使用PCRE_DOTALL(.标志),否则
  • /s与换行符不匹配。

  • 对于preg_split,您还需要PREG_SPLIT_DELIM_CAPTURE选项才能摆脱匹配的部分。

  • 在您的情况下,最好使用preg_match_all代替preg_split

最后,在预料到您的下一个问题时,您的表达与您的来源不符:

...>|<script type="text/javascript>.*?<....
                                  ^

总之,更好地使用以下内容:

preg_match_all("~( <script[^>]*>  (.*?)  </script> )~smix", $src, ...

答案 1 :(得分:1)

试试这个:

$str=<<<STR
<script>
  console.log('hello');
  console.log('hi!');
  console.log('yo!');
</script>
<script type="text/javascript">
  console.log("this is another inline script");
  var hi = "cool";
  console.log(hi);
</script>
STR;

$split = preg_split('#(?=<script)#', $str,null,PREG_SPLIT_NO_EMPTY);
var_dump($split);

我将正则表达式更改为:

#(?=<script)#

结果是:

array(2) {
  [0]=>
  string(93) "<script>
    console.log('hello');
    console.log('hi!');
    console.log('yo!');
</script>
"
  [1]=>
  string(133) "<script type="text/javascript">
    console.log("this is another inline script");
    var hi = "cool";
    console.log(hi);
</script>"
}