Question

我正在尝试使用正则表达式从源代码中删除脚本标记。

/<\s*script[^>]*[^\/]>(.*?)<\s*\/\s*script\s*>/is

但是当我需要删除其他代码中的代码时，我遇到了问题。

Please see this screenshot

我在https://regex101.com/r/R6XaUT/1

进行了测试

如何正确创建正则表达式以覆盖所有代码？

Answer 1

示例文字

$text = '<b>sample</b> text with <div>tags</div>';

strip_tags（$ text）的结果：

Output: sample text with tags

strip_tags_content（$ text）的结果：

Output: text with

strip_tags_content的结果（$ text，＆＃39;＆＃39;）：

Output: <b>sample</b> text with

strip_tags_content的结果（$ text，＆＃39;＆＃39;，TRUE）;

Output: text with <div>tags</div>

我希望有人有用:) source link

Answer 2

只需使用PHP函数strip_tags即可。参见

http://php.net/manual/de/function.strip-tags.php

$string = "<div>hello</div>";
echo strip_tags($string);

将输出

hello

您还可以提供要保留的标记列表。

==

另一种方法是：

// Load a file into $html
$html = file_get_contents('scratch.html');
$matches = [];
preg_match_all("/<\/*([^\s>]*)>/", $html, $matches);

// Have a list of all Tags only once
$tags = array_unique($matches[1]);

// Find the script index and remove it
$scriptTagIndex = array_search("script", $tags);
if($scriptTagIndex !== false) unset($tags[$scriptTagIndex]);

// Taglist must be a string containing <tagname1><tagename2>...
$allowedTags = array_map(function ($s) { return "<$s>"; }, $tags);

// Stript the HTML and keep all Tags except for removed ones (script)
$noScript = strip_tags($html,join("", $allowedTags));

echo $noScript;

如何通过正则表达式删除另一个代码中的脚本标记

2 个答案: