Question

除了<>字符外，如何删除PHP中的所有HTML标记？

//There's other HTML tags, like h1, div, etc.
echo strip_tags('<gone with the wind> <p>a hotest book</p>');

这将返回a hotest book，但我需要保留图书名称。我需要函数返回<gone with the wind> a hotest book。

Answer 1

您应该考虑使用<（＆lt;）和&rt;（＆gt;）。

Answer 2

以下将利用DOM查找任何不是有效HTML4元素的元素，并将其视为书名。然后，这些内容将在strip_tags中列入白名单。

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($html);

echo strip_tags($html, implode(',', 
    array_map(
        function($error) {
            return '<' . sscanf($error->message, 'Tag %s invalid')[0] . '>';
        },
        libxml_get_errors()
    )
));

Online Demo

请注意，任何以有效HTML标签开头的书名都将被视为有效的HTML并因此被剥离（例如“证据正文”或“Head First PHP”）。另请注意，<gone with the wind>被视为“已消失”的元素，其属性为“with”，“the”和“wind”。对于有效元素，您可以检查它们是否只有空属性，然后删除它们，但如果标题仅由有效元素名称组成，则仍然不会100％准确。此外，您可以检查结束标记，但我不知道如何使用DOM（XMLParser可以检测它们）。

无论如何，要弄清楚这些书名的更好格式，例如使用命名空间或使用不同于尖括号的分隔符将大大提高您正确执行此操作的机会。

Answer 3

这是一个简单但不是万无一失的解决方案。

<强> PHP

$data = "<gone with the wind> <p>a hotest book</p>";
$out = preg_replace("/\<\w+\>|\<\/\w+\>/im", "", $data);

var_dump($out);

<强>输出

string '<gone with the wind> a hotest book' (length=34)

会匹配

<p>text</p>
<anything>text</anything>

不匹配

就像之前所说过的那样，代码无法知道书名的样子。

<img src="url">

虽然，如果您希望自己的数据是简单的<p>标记，那么这样就可以了。

疯狂的解决方案，以为我会把它扔出去。

Answer 4

你也可以这样做更容易。

   <?php
   $string = htmlspecialchars("<gone with the wind>");
   echo strip_tags( "$string <p>a hotest book</p>");
   ?>

这将输出：

   <gone with the wind> a hotest book

DEMO HERE

Answer 5

$string = '<gone with the wind> <p>a hotest book</p>';
$string = strip_tags(preg_replace("/<([\w\s\d]{6,})>/", "&lt;$1&gt;", $string));
$string = html_entity_decode($string);

以上内容会转换<>到<>之间超过六个字母的所有“标记”，然后您可以使用strip_tags。

您可能需要尝试使用六个值，具体取决于您的传入数据。如果您收到<article>这样的标记，则可能需要将其推高。

Answer 6

我能想到的最好的事情是做这样的事情，因为我不知道会使用什么类型的标签我只假设所有这些标签，这应该删除任何有效的html标签，而不仅仅是那些就像他们可能是标签一样。

<?php
$tags = array("!DOCTYPE","a","abbr","acronym","address","applet","area","article","aside","audio","b","base","basefont","bdi","bdo","big","blockquote","body","br","button","canvas","caption","center","cite","code","col","colgroup","command","datalist","dd","del","details","dfn","dir","div","dl","dt","em","embed","fieldset","figcaption","figure","font","footer","form","frame","frameset","h1","h2","h3","h4","h5","h6","head","header","hgroup","hr","html","i","iframe","img","input","ins","kbd","keygen","label","legend","li","link","map","mark","menu","meta","meter","nav","noframes","noscript","object","ol","optgroup","option","output","p","param","pre","progress","q","rp","rt","ruby","s","samp","script","section","select","small","source","span","strike","strong","style","sub","summary","sup","table","tbody","td","textarea","tfoot","th","thead","time","title","tr","track","tt","u","ul","var","video","wbr");

$string = "<gone with the wind> <p>a hotest book</p>";


echo preg_replace("/<(\/|)(".implode("|", $tags).").*>/iU", "", $string);

最终输出如下：

<gone with the wind> a hotest book

Answer 7

你会对此感到不幸，因为你无法知道<>中的哪些内容是HTML标签，哪些是书名。你甚至不能写一些看起来像标签但实际上不是有效的HTML标签的东西，因为你可能会得到Monkees的1968年电影“Head”的记录，它将被视为<Head>这当然是一个有效的HTML标签。

您需要与数据供应商合作，然后才能使用PHP strip_tags功能。

php条带标签，除了'＆lt;＆gt;' （书名）

7 个答案: