Question

我现在正在运行的代码存在这个问题。

我的代码是我输入一个URL，当我点击提交时，它会删除所有标签。我使用strip_tags作为那个。然后我使用preg_match_all("/((?:\w'|\w|-)+)/", $contents, $words);创建每个单词的数组。然后我有一个foreach循环，它将计算所有单词，然后将它放在一个带有另一个foreach循环的表中。

问题是例如。假设我输入的URL包含以下内容：

<html>
    <head>
        <title>titel1</title>
    </head>
    <body>
        <div id="div1">
            <h1 class="class2">
                Testpage-h1
            </h1>
            <p>
                Testpage-p
            </p>
        </div>
        <script>
            alert('hallo');
            document.getElementById('class2');
        </script>
    </body>
</html>

这将使用我的代码回显以下内容：

document         1
getElementById1  1
class2'          1
hallo            1
alert            1
Testpage-h1      1
Testpage-p       1
titel1           1

（很抱歉将其作为＆＃39;代码＆＃39;但它不会让我使用休息，否则将数字放在彼此之下）

我的问题在于它不应该显示<script></script>标签之间的内容，因为这对我来说无用。这个问题有解决方案吗？

我已经尝试过消毒过滤等事情，但这对我没有帮助。

Answer 1

你可以删除＆lt;脚本＆gt; ...＆lt; / script＆gt;从任何计算前的字符串开始：

$text = preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);

或来自remove script tag from HTML content的其他解决方案（更慢，但有时更正确）：

$doc = new DOMDocument();

// load the HTML string we want to strip
$doc->loadHTML($html);

// get all the script tags
$script_tags = $doc->getElementsByTagName('script');

$length = $script_tags->length;

// for each tag, remove it from the DOM
for ($i = 0; $i < $length; $i++) {
  $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}

// get the HTML string back
$no_script_html_string = $doc->saveHTML();

strip_tags，删除javascript

1 个答案: