如何从字符串中删除编码的HTML标记

时间:2017-11-22 11:43:23

标签: javascript regex

我有类似下面的字符串,我想从该字符串中删除所有编码的html引号。我尝试使用正则表达式,但无法实现所需的输出。

var str = 'Replace all html codes <p class="text-lg"> from this string and return only text</p>';

所需的输出应如下所示

var output = 'Replace all html codes from this string and return only text';

2 个答案:

答案 0 :(得分:2)

我们可以使用good answer here来做您需要的事情。

以上答案包含以下功能:

function strip(html){
  var doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.body.textContent || "";
}

如果你在字符串上运行它,你会发现编码的html现在显示为常规的html标签。如果您再次运行该函数,则会删除html。

var str = 'Replace all html codes <p class="text-lg"> from this string and return only text</p>';

function strip(html){
  var doc = new DOMParser().parseFromString(html, 'text/html');
  return doc.body.textContent || "";
}

console.log(strip(str));
console.log(strip(strip(str)));

答案 1 :(得分:2)

这是正则表达式。

let x = 'Replace all html codes <p class="text-lg"> from this string and return only text</p>';

let result = x.replace(/(&.*?>)/g, "")
console.log(result); //==> Replace all html codes from this string and return only text

这里是regex101 https://regex101.com/r/IzKHki

的正则表达式
Explanation: 
    1st Capturing Group (&.*?>)
      & matches the character & literally (case sensitive)
     .*? matches any character (except for line terminators)
      *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
     > matches the characters > literally (case sensitive)
Global pattern flags
      g modifier: global. All matches (don't return after first match)