删除除<a>, <br/>, <b> and <img/>

时间:2017-09-28 10:26:09

标签: javascript html regex ecmascript-5

When reading some email HTML body, I often have lots of HTML tags, that I don't want anymore.

How to remove from a string, in Javascript, all HTML tags like:

<anything ...>

or

</anything>

except these few cases <x ...>, </x>, <x ... /> for x being:

  • a
  • br
  • b
  • img

I thought about something like:

s.replace(/<[^a].*>/g, '');

but I'm not sure how to do it.

Example:

<div id="hello">Hello</div><a href="test">Youhou</a>` 

should become

Hello<a href="test">Youhou</a>

Note: I'm looking for a few lines-of-code solution that would work for 90% of the times (the email body comes from my own emails, so I didn't include anything malicious), not for a full solution that would require third-party tool/library.

3 个答案:

答案 0 :(得分:2)

尝试替换

UIKeyboardWillShowNotification

nothing

<\/?(?!(a|br|b|img)\b)\w+[^>]*> 匹配开始<\/?,可选地后跟<

/否定预测,确保我们不匹配(?!(a|br|b|img)\b)abrb标记。

img匹配标记的其余部分。

Here at regex101

答案 1 :(得分:1)

这不是很漂亮,但应符合您的要求

html.replace(/<\/?([^\s>])[^>]*>/gi,function(tag,tagName){
    return ['a','b','br','img'].indexOf(tagName.toLowerCase()) >= 0? tag: '';
})

\/?可选斜杠([^\s>])匹配标记名[^>]* attributs space ect

答案 2 :(得分:-1)

您可以将函数作为第二个参数传递给.replace,这将决定如何处理输出。

str.replace(/<[^a].*>/g, function (s) { /* do something with s */ });

请参阅有关替换的MDN文档:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace