正则表达式:从字符串中剥离HTML标记,但将电子邮件地址保留在“ <>”中

时间:2019-01-29 22:00:53

标签: regex

我调查了许多StackOverflow问题,但没有一个回答我的问题。

基本上,我可以输入如下字符串:

"From: 'Hima Chitalia (Hima- at Web Development)' via IWeb Development Support [mailto:hima@webdevelopement.com] 
<div>Sent: Monday, January 7, 2019 7:24 PM
</div><div>To: Hima Chhag (hc) <hchhag@wd.com>;
</div><div>Cc: Hima (hagain) <hagain@web.com>;
</div><div>Subject: RE: strip off HTML Tags but not email addresses
</div><div><br></div>"

因此,如果有任何HTML标记,我需要将其替换为空字符串。但是,如果有这样的电子邮件地址“ ”。应该保持原样。

我尝试过的几件事:

string.replace(/<[^<>]*>/g,'')
string.replace(/<[^>]*>/g,'')
string.replace(/<(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*>/g,'')

实际结果:

 "From: 'Hima Chitalia (Hima- at Web Development)' via Web Development Support [mailto:hima@webdevelopement.com] 
Sent: Monday, January 7, 2019 7:24 PM
To: Hima Chhag (hc) ; 
Cc: Hima (hagain) 
Subject: RE: strip off HTML Tags but not email addresses
"

预期:

 "From: 'Hima Chitalia (Hima- at Web Development)' via Web Development Support [mailto:hima@webdevelopement.com] 
Sent: Monday, January 7, 2019 7:24 PM
To: Hima Chhag (hc) <hchhag@wd.com>; 
Cc: Hima (hagain) <hagain@web.com>
Subject: RE: strip off HTML Tags but not email addresses
"

有什么建议可以解决吗?

1 个答案:

答案 0 :(得分:1)

在这里,您只需要检查标签名称而无需使用@

string.replace(/<[^@>]+>/g,'')