我想从字符串中删除HTML标记(元素)而不触及像& nbps这样的html实体; &安培; é<等。
现在我正在使用它:
stringWithTag = "<i> I want to keep my -> <- element space, but remove the tags <b>Please Help</b></i>";
var div = document.createElement('div');
div.innerHTML = stringWithTag;
console.log("INPUT with html entity ");
console.log(stringWithTag);
htmlNoTag = div.textContent || div.innerText || "";
console.log("\nOUTPUT that should still have entity , but not...");
console.log(htmlNoTag);
cf jsfiddle:https://jsfiddle.net/az4st8LL/
但是我总是想念元素实体(在那个例子中应该仍然可见,但事实并非如此)。 如果可能,我想避免使用正则表达式删除所有html标记。
有没有人有解决方案?
谢谢,
答案 0 :(得分:0)
你可能想要考虑使用正则表达式(从this answer偷来):
string.replace(/<(?:.|\n)*?>/gm, '')
stringWithTag = "<i> I want to keep my element space, but remove the tags <b>Please Help</b></i>";
console.log(stringWithTag.replace(/<(?:.|\n)*?>/gm, ''));
通过删除“ - &gt;”我作弊了一点和“&lt; - ”来自你的字符串 - 因为正则表达式匹配“&lt;”之间的所有内容和“&gt;”,这些角色打破了演示。
答案 1 :(得分:0)
您可以使用搜索tags
的正则表达式并将其反转。然后用此正则表达式和encodeURIComponent
值替换您的字符串。然后,当您需要使用它时,您可以decodeURIComponent
。
var stringWithTag = "<i> I want to keep my -> <- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my -> <- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my -> <- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my -> <- element space, but remove the tags <b>Please Help</b></i>";
var tags = stringWithTag.match(/(<[^>]>|<\/[^>]>)/g);
var startIndex = 0;
var str = "";
tags.reduce(function(p,c){
var i = stringWithTag.indexOf(p, startIndex)+ p.length;
var j = stringWithTag.indexOf(c, startIndex);
str += p+ encodeURIComponent(stringWithTag.substring(i,j)) + c;
startIndex = j;
return c
})
var div = document.createElement('div');
div.innerHTML = str;
//console.log("INPUT with html entity ");
//console.log(stringWithTag);
htmlNoTag = div.textContent || div.innerText || "";
//console.log("\nOUTPUT that should still have entity , but not...");
console.log(decodeURIComponent(htmlNoTag));