Question

这个主题不允许我请求它的身体和/或头部的解决方案，内联，索引等。

我想控制我删除的脚本和脚本数量。

我希望我不必经历有关使用某些内容的论点除了正则表达式以外的东西。我在这个主题上最喜欢的答案来自Binh：

$html = preg_replace("/<script.*?\/script>/s", "", $html) ? : $html;

我希望这种方法尽可能多地进行粒度控制，但这会从整个$content中删除脚本。我想看到这只是从身体中移除脚本，（或从身体上下到远端）。

还只是从头部删除脚本，（或从远处顶部移除身体）。也是通过索引。比如身体第一，头部第四等等。

最后，我希望看到内联元素js的东西删除，同样多的控制尽可能。

由于

Answer 1

我最终会回答你的问题，让我对你将要做的事情进行解释

正如您未说明的那样，我不太确定您为什么要这样做。从用户收集原始html然后在其他地方显示它被认为是一个巨大的安全漏洞。要使用纯正的正则表达式来摆脱所有javascript将是困难的。摆脱脚本标签很容易，但删除内联javascript将是困难的部分。虽然可能，我建议找另一种方法来执行你的任务，而不是给用户一个javascript剥离版本的网页。

你可以通过iframe实现这一目标。使用

<iframe src="html_you_want_to_strip" sandbox=""></iframe>

将停止在iframe内运行所有javascript。请记住，还有其他方法可以在不使用javascript的情况下将恶意项目加载到您的网站中。

现在我已经解释了在剥离javascript时你应该做什么，回答你的问题，

一个。仅从正文或标题中删除脚本标记：

删除javascript时获取粒度的最佳方法是使用PHP的DOMDocument类。基本上，您将文档加载到此DOMDocument类中，并删除您想要的任何脚本标记。例如，如果你只是想摆脱正文中的脚本标签，你可以这样写：

<?php
$html = "the HTML you want filtered";
$DOM = new DOMDocument('1.0','utf-8');
$DOM->loadHTML($html);
$bodyTags = $DOM->getElementsByTagName('body');
/* 
 We will run under the assumption that the user has the ability to add two 
 body tags and hide information in the second one, that is why we don't 
 just use $DOM->getElementsByTagName('body')[0] 
*/
foreach($bodyTags as $body){
    foreach($body->getElementsByTagName('script') as $script){
        $script->parentNode->removeChild($script);
        /*
         The reason we have to this is because you cant just do 
         $script->remove(), that would be too easy :)
        */
    }
}

上面的代码可以用来从head标签中删除脚本。如果您想删除具有特定索引的项目，可以使用foreach

执行以下操作

$i=0;
foreach($body->getElementsByTagName('script') as $script){
    if($i!==(INDEX_TO_KEEP)){
        $script->parentNode->removeChild($script);
    }
}

B中。删除内联javascript

我们可以使用相同的DOMDocument解析器，除了解析所有元素，这次寻找所有的javascript事件（幸运的是，所有javascript事件都开始）。代码如下所示。

<?php
//starting where the last code leaves off
foreach($DOM->getElementsByTagName('*') as $element){
    //This selects all elements
    foreach($element->attributes as $attribute){
        if(preg_match('/on.*/',$attribute)==1){
            /*
             "on" looks for on and ".*" states that there 
             can be anything after the on (onmousemove,onload,etc.)
            */
            $element->removeAttribute($attribute)
        }
    }
}

在代码的最后，您需要保存已剥离的HTML并将其返回给用户

$parsedHTML = $DOM->saveHTML()

php从完整的$ content body和or head，inline，by index等中删除脚本标记

1 个答案: