Question

可以使用

的组合填充文本字段

<p></p> 
<p>&nbsp;</p>
<br>
<span></span>
<div></div>

以及其他一些变体，包括空格和

我想删除它，因为它会破坏网络上的格式。

我正在考虑一个递归函数，它删除了尾随
和“”，然后查找最后的结束标记，找到匹配的开始标记，并将内容提供给自身。如果返回的内容为空，请删除标记。

它可以作为mssqlserver 2008，vbscript（经典asp）或php中的存储过程运行。

Answer 1

这可以用正则表达式完成，我认为在这种情况下，DOM不是最简单的方法。 php的一个例子：

$pattern = '~(?><(p|span|div)\b[^>]*+>(?>\s++|&nbsp;)*</\1>|<br/?+>|&nbsp;|\s++)+$~i';
$result = preg_replace($pattern, '', $text);

说明：

~
 (?>                          # open an atomic group
     <(p|span|div)\b[^>]*+>   # opening tags, note that this subpattern allows
                              # attributes with [^>]*+ you can remove it if you
                              # don't need it
           (?>\s++|&nbsp;)*   # content allowed inside the tags *

     </\1>                    # closing tag (refer to the first capturing group)
   |                          # OR
     <br/?+>                  # stand alone tag <br>
   |                          # OR
     &nbsp;                   # &nbsp;
   |                          # OR
     \s++                     # white characters
  )+$
~i

（*）请注意，此模式不处理嵌套标记，如：<div><p></p><\div>，但可以使用递归模式解决问题：

$pattern = '~(<(p|span|div)\b[^>]*+>(?1)*</\2>|<br/?+>|&nbsp;|\s++)+$~i';

此处(?1)指的是第一个捕获组。

Answer 2

最简单的答案是这个，并不涉及复杂的正则表达式：

$html = str_replace('<span></span>', '' ,$html);
$html = str_replace('<p></p>', '' ,$html);
$html = str_replace('<div></div>', '' ,$html);

将$ html替换为所有输出的字符串。

Simples！

正则表达式最后修剪html空格

2 个答案: