如何从字符串中删除html标记但保持html实体不变

时间:2016-11-17 11:35:56

标签: javascript html tags entity innerhtml

我想从字符串中删除HTML标记(元素)而不触及像& nbps这样的html实体; &安培; é<等。

现在我正在使用它:

stringWithTag = "<i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i>";
    var div = document.createElement('div');
    div.innerHTML = stringWithTag;
    
    console.log("INPUT with html entity &nbsp;");
    console.log(stringWithTag);

    htmlNoTag = div.textContent || div.innerText || "";
    console.log("\nOUTPUT that should still have entity &nbsp;, but not...");
    console.log(htmlNoTag);

cf jsfiddle:https://jsfiddle.net/az4st8LL/

但是我总是想念元素实体(在那个例子中应该仍然可见,但事实并非如此)。 如果可能,我想避免使用正则表达式删除所有html标记。

有没有人有解决方案?

谢谢,

2 个答案:

答案 0 :(得分:0)

你可能想要考虑使用正则表达式(从this answer偷来):

string.replace(/<(?:.|\n)*?>/gm, '')

stringWithTag = "<i> I want to keep my &nbsp; element space, but remove the tags <b>Please Help</b></i>";

console.log(stringWithTag.replace(/<(?:.|\n)*?>/gm, ''));

通过删除“ - &gt;”我作弊了一点和“&lt; - ”来自你的字符串 - 因为正则表达式匹配“&lt;”之间的所有内容和“&gt;”,这些角色打破了演示。

答案 1 :(得分:0)

您可以使用搜索tags的正则表达式并将其反转。然后用此正则表达式和encodeURIComponent值替换您的字符串。然后,当您需要使用它时,您可以decodeURIComponent

var stringWithTag = "<i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i><i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i>";

var tags = stringWithTag.match(/(<[^>]>|<\/[^>]>)/g);
var startIndex = 0;
var str = "";

tags.reduce(function(p,c){
  var i = stringWithTag.indexOf(p, startIndex)+ p.length;
  var j = stringWithTag.indexOf(c, startIndex);
  str += p+ encodeURIComponent(stringWithTag.substring(i,j)) + c;
  startIndex = j;
  return c
})

var div = document.createElement('div');
div.innerHTML = str;

//console.log("INPUT with html entity &nbsp;");
//console.log(stringWithTag);

htmlNoTag = div.textContent || div.innerText || "";
//console.log("\nOUTPUT that should still have entity &nbsp;, but not...");
console.log(decodeURIComponent(htmlNoTag));

参考