Question

如果你有一个包含HTML实体的字符串并且想要忽略它，那么建议多次使用此解决方案（或其变体）：

function htmlDecode(input){
  var e = document.createElement('div');
  e.innerHTML = input;
  return e.childNodes.length === 0 ? "" : e.childNodes[0].nodeValue;
}

htmlDecode("&lt;img src='myimage.jpg'&gt;"); 
// returns "<img src='myimage.jpg'>"

（例如，见答案：https://stackoverflow.com/a/1912522/1199564）

只要字符串不包含换行符并且我们不在Internet Explorer版本10之前运行（在版本9和8上测试），这样就可以正常工作。< / p>

如果字符串包含换行符，则IE 8和9将用空格字符替换它，而不是保持不变（因为它在Chrome，Safari，Firefox和IE 10上）。

htmlDecode("Hello\nWorld"); 
// returns "Hello World" on IE 8 and 9

有关在版本10之前使用IE的解决方案的任何建议吗？

Answer 1

最简单但可能不是最有效的解决方案是让htmlDecode()仅对字符和实体引用起作用：

var s = "foo\n&amp;\nbar";
s = s.replace(/(&[^;]+;)+/g, htmlDecode);

更高效的是使用htmlDecode()的优化重写，每次输入只调用一次，仅对字符和实体引用起作用，并重用DOM元素对象：

function htmlDecode (input)
{
  var e = document.createElement("span");

  var result = input.replace(/(&[^;]+;)+/g, function (match) {
    e.innerHTML = match;
    return e.firstChild.nodeValue;
  });

  return result;
}

/* returns "foo\n&\nbar" */
htmlDecode("foo\n&amp;\nbar");

Wladimir Palant已经指出了这个函数的XSS问题：The value of some (HTML5) event listener attributes, like onerror, is executed if you assign HTML with elements that have those attributes specified to the innerHTML property.所以你不应该在包含实际HTML的任意输入上使用这个函数，只能在已经转义的HTML上使用。否则，您应该相应地调整正则表达式，例如使用/(&[^;<>]+;)+/来阻止&…; …包含标记的匹配。

对于任意HTML，请查看他的alternative approach，但请注意它与此不兼容。

在Javascript中包含换行符的Unescape HTML实体？

1 个答案: