DOMDocument - 删除脚本标记的所有内容

时间:2016-10-17 17:33:24

标签: php domdocument

我尝试使用以下代码删除html内容中的所有脚本内容:

    $dom = new DOMDocument();
    $dom->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $html);

        $script = $dom->getElementsByTagName('script');

        $remove = [];
        foreach($script as $item)
        {
            $remove[] = $item;
        }

        foreach ($remove as $item)
        {
            $item->parentNode->removeChild($item);
        }
$dom->saveHTML();

我尝试删除的脚本是:

<script type="text/javascript">
/* DO NOT MODIFY */
(function(w,d) {
  w.cdxhd = w.cdxhd || [];
  w.cdxhd.push({
    "cdxhd_bpid":158716,
    "cdxhd_w"   :300,
    "cdxhd_h"   :250,
    "cdxhd_vtag":20140808,
    "cdxhd_slot":w.cdxhd.length
  });
 d.write('<div class="cdxhd_'+cdxhd[(cdxhd.length-1)]['cdxhd_bpid']+' cdxhd_slot" id="cdxhd_bpid_'+cdxhd[(cdxhd.length-1)]['cdxhd_bpid']+'_'+cdxhd[(cdxhd.length-1)]['cdxhd_slot']+'" style="position:relative;"></div>');
 var s = d.createElement("script"); s.async = true;
 s.src = (d.location.protocol==="https:"?"https":"http")+'://ca.cubecdn.net/js/loader_v2.js?cb='+(new Date().getMinutes()+new Date().getHours());
 var a = d.getElementsByTagName("script")[0]; a.parentNode.insertBefore(s,a);
}(this,document));
</script>

但我仍然得到:

 var s = d.createElement("script"); s.async = true;\n
 s.src = (d.location.protocol==="https:"?"https":"http")+'://ca.cubecdn.net/js/loader_v2.js?cb='+(new Date().getMinutes()+new Date().getHours());\n
 var a = d.getElementsByTagName("script")[0]; a.parentNode.insertBefore(s,a);\n
}(this,document));

脚本的最后一部分未删除。

是否有任何解决方案可以在没有正则表达式的情况下删除所有脚本及其内容?

0 个答案:

没有答案