Question

我正在寻找通过Google Apps脚本中的内容对HTML标签进行条带化处理的方法。

目前我正在使用这些函数进行HTML解析：

function getTextFromHtml(body) {
  return getTextFromNode(Xml.parse(body, true).getElement());
}

function getTextFromNode(x) {
 switch(x.toString()) {
  case 'XmlText': return x.toXmlString();
  case 'XmlElement': return x.getNodes().map(getTextFromNode).join('');
  default: return '';
 }
}

但对于漫长的HTML来说，这种方式效率很低。

示例HTML内容：http://pastebin.com/FmB4hvN2

有什么想法吗？

Answer 1

这将删除输入中的所有标记。

 var text = html.replace(/<[^>]+>/g, "");

Answer 2

如果要替换的内容始终用＆lt;和＆gt;，你可以做

Regex rgx = new Regex(someString);
string result = rgx.Replace("<[^>]*>", "");

Google Apps脚本中的条带HTML标记

2 个答案: